This is my course page for the R exercises in the Spatial Analysis and Data Exploration in History and Archaeology course, University of Helsinki, lectured by Eljas Oksanen in the Spring of 2021.
How to create a course page:
You need to have R, RStudio and Git installed and Git linked to your Rstudio before this step.
Create a new GitHub repository
Open RStudio and create a new project by selecting: “File” - “New Project” - “Version Control” - “Git” and paste the web URL in the “Repository URL” box.
Create a web page on GitHub:
Go back to the GitHub repository.
Open “Settings” and scroll to “GitHub Pages”. From “Source” select “master branch”.
Now your course diary web page is online at github_username.github.io/repository_name
In importing the .csv data to R there was a problem with importing European style .csv in which data is separated with ; instead of comma, and decimals are displayed with commas instead of dots. This was resolved by adding [sep=“;”, dec=“,”] when importing the data. I did also need to define the margin size for plots in order to display the results, but otherwise the exercise didn’t pose much difficulty.
# I tested if the file is readable and if my path is correct:
file.exists("~/R/SADE/week1/axes.csv")
## [1] TRUE
# Importing the data
setwd("~/R/SADE/week1")
axes <- read.csv(file="axes.csv", header=TRUE, sep=";", dec=",")
class(axes)
## [1] "data.frame"
head(axes)
## km_number find_type length width thickness weight x y
## 1 32039 kirveet 120.0 40.0 40 420.0 6987601 559225
## 2 33070 kirveet 91.0 50.0 25 331.0 7204072 587496
## 3 35902 kirveet 197.0 75.0 37 620.0 7638696 492759
## 4 39149 kirveet 193.0 82.0 45 1016.0 6941672 467565
## 5 39178 kirveet 174.0 41.0 84 759.8 6802153 328646
## 6 39452 kirveet 99.5 53.5 30 248.1 6748437 371160
summary(axes)
## km_number find_type length width
## Min. :32039 Length:16 Min. : 55.0 Min. : 40.00
## 1st Qu.:39171 Class :character 1st Qu.:110.4 1st Qu.: 54.62
## Median :39599 Mode :character Median :143.0 Median : 70.50
## Mean :38635 Mean :136.8 Mean : 75.97
## 3rd Qu.:40106 3rd Qu.:163.5 3rd Qu.: 92.25
## Max. :40334 Max. :197.0 Max. :151.00
##
## thickness weight x y
## Min. : 9 Min. : 88.1 Min. :6705920 Min. :328646
## 1st Qu.:29 1st Qu.: 317.5 1st Qu.:6746623 1st Qu.:364874
## Median :37 Median : 466.9 Median :6872906 Median :469289
## Mean :36 Mean : 501.4 Mean :6969577 Mean :468313
## 3rd Qu.:40 3rd Qu.: 652.5 3rd Qu.:7201614 3rd Qu.:577979
## Max. :84 Max. :1016.0 Max. :7638696 Max. :587496
## NA's :3
summary(axes$weight)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 88.1 317.5 466.9 501.4 652.5 1016.0
summary(axes$length)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 55.0 110.4 143.0 136.8 163.5 197.0
sd(axes$weight)
## [1] 277.1391
sd(axes$thickness, na.rm=T)
## [1] 17.46425
# Plotting length vs weight
par(mai=c(1,1,1,1))
plot(x=axes$length, y=axes$weight, col="black", pch=16, main="Axe heads recovered by members of the public")
# Boxplot: Axe length and width
boxplot(axes$length, axes$width, ylab="millimeters")
# Barplot: Axe weight, no sorting
barplot(axes$weight, ylab="grams")
# Barplot: Axe weight, sorted
barplot(sort(axes$weight), ylab="grams")
# Histogram: Axe weight, default
hist(axes$weight)
# Histogram: Axe weight, breaks defined
hist(axes$weight, breaks=seq(0, 1200, 100))
# Histogram: Combination
# Note: This works in R, but not in the knitted html page
dev.new(device=pdf, height=6, width=12)
par(mfrow=c(1,2), mai=c(1,1,1,1))
barplot(sort(axes$weight), ylab="grams", names.arg = c(axes$km_number), las=3)
hist(axes$weight)
# Necessary libraries + Finland ESRI Shapefile
library(rgdal)
library(raster)
fin <- readOGR(dsn="finland", layer="finland_valtakunta_simpler")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\Uine\Documents\R\SADE\week1\finland", layer: "finland_valtakunta_simpler"
## with 1 features
## It has 4 fields
## Integer64 fields read as strings: GML_ID
# Summary of Finland ESRI Shapefile
summary(fin)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
## min max
## x 83725.35 732881.9
## y 6637635.87 7776440.6
## Is projected: TRUE
## proj4string :
## [+proj=utm +zone=35 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m
## +no_defs]
## Data attributes:
## GML_ID NATCODE NAMEFIN NAMESWE
## Length:1 Length:1 Length:1 Length:1
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
# Plotting the data with monuments from .csv file
plot(fin, col="lightgrey")
monu <- read.csv(file="monuments_fha.csv", header=TRUE, sep=",")
coordinates(monu) <- ~X+Y
crs(monu) <- crs(proj4string(fin))
points(monu, pch=19, cex=0.1)
points(monu[monu$period == "stone age", ], pch=19, cex=0.1, col="red")
# Importing the data
# Info about readOGR function: ?readOGR
library(rgdal)
library(raster)
library(spatstat)
library(maptools)
setwd("~/R/SADE/week2")
polyg <- readOGR(dsn="englandwales", layer="engwales_simple")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\Uine\Documents\R\SADE\week2\englandwales", layer: "engwales_simple"
## with 1 features
## It has 1 fields
# Digital Elevation Model (DEM) taken from NASA’s Shuttle Radar Topography Mission data
dem <- raster("dem/engwales_dem.tif")
plot(polyg)
plot(dem, add=T)
# Change the color scheme to more intuitive one
plot(dem, add=T, col=terrain.colors(5))
# Loading medieval market sites from Samantha Letters’s (2002) Gazetteer of Markets and Fairs in England and Wales to 1516.
markets <- readOGR(dsn="markets1334", layer="markets1334")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\Uine\Documents\R\SADE\week2\markets1334", layer: "markets1334"
## with 2293 features
## It has 7 fields
points(markets, pch=19, cex=0.2)
class(markets)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"
head(markets, n=10)
## ID MODNAME COUNTY DEFYR BOROUGH VAL_1334 by1200
## 1 28 CANTERBURY KENT 600 1 0.00 1
## 2 7 LONDON MIDDLESEX 650 1 11000.00 1
## 3 34 YORK YORKSHIRE 700 1 1620.00 1
## 4 32 SOUTHAMPTON HAMPSHIRE 750 1 511.17 1
## 5 43 ROCHESTER KENT 821 1 0.00 1
## 6 24 WORCESTER WORCESTERSHIRE 899 1 300.00 1
## 7 8 WALLINGFORD BERKSHIRE 900 1 96.24 1
## 8 57 EXETER DEVON 900 1 366.17 1
## 9 119 HALWELL DEVON 900 1 13.50 1
## 10 29 LYDFORD DEVON 900 1 11.67 1
# Manipulating the data:
# Lets make sure that the two have the same coordinate reference system (CRS) by taking the polygon’s CRS and applying it to the markets.
markets <- spTransform(markets, CRS(proj4string(polyg)))
# Subsetting the markets within the polygon's area to drop out those that fall outside of its borders
markets <- markets[polyg, ]
# Markets are events rather than places - let's save the data that contains duplicates as "market_events" and then remove duplicates from "markets"
market_events <- markets
markets <- remove.duplicates(markets)
# Creating spastat object from markets
sp_markets <- as.ppp(coordinates(markets), as.owin(polyg))
# Kernel Density Estimate (KDE)
dens <- density(sp_markets, sigma=10000, edge=TRUE, eps=500)
plot(dens)
# Computation to determine an “optimal” bandwidth (sigma)
bw.diggle(sp_markets)
## sigma
## 18971.5
# Adjusting color scheme and adding markets as points
plot(dens, col=heat.colors(10))
points(markets, pch=19, cex=0.2)
# Investigating data for attributes that might have overtly strong impact
head(markets, n=20)
## ID MODNAME COUNTY DEFYR BOROUGH VAL_1334 by1200
## 1 28 CANTERBURY KENT 600 1 0.00 1
## 2 7 LONDON MIDDLESEX 650 1 11000.00 1
## 3 34 YORK YORKSHIRE 700 1 1620.00 1
## 4 32 SOUTHAMPTON HAMPSHIRE 750 1 511.17 1
## 5 43 ROCHESTER KENT 821 1 0.00 1
## 6 24 WORCESTER WORCESTERSHIRE 899 1 300.00 1
## 7 8 WALLINGFORD BERKSHIRE 900 1 96.24 1
## 8 57 EXETER DEVON 900 1 366.17 1
## 9 119 HALWELL DEVON 900 1 13.50 1
## 10 29 LYDFORD DEVON 900 1 11.67 1
## 11 72 PILTON DEVON 900 1 10.00 1
## 12 46 BRIDPORT DORSET 900 1 99.71 1
## 13 12 SHAFTESBURY DORSET 900 1 200.00 1
## 14 50 WAREHAM DORSET 900 1 63.50 1
## 15 17 CHRISTCHURCH HAMPSHIRE 900 1 39.75 1
## 16 37 PORTCHESTER HAMPSHIRE 900 1 119.01 1
## 17 16 WINCHESTER HAMPSHIRE 900 1 515.17 1
## 18 33 AXBRIDGE SOMERSET 900 1 45.00 1
## 19 36 BATH SOMERSET 900 1 133.33 1
## 20 92 LANGPORT SOMERSET 900 1 40.00 1
# We find out that mean value has very big spread (up to £11,000)
summary(markets$VAL_1334)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.00 0.00 40.00 70.86 83.92 11000.00
# The wealth of London is statistical outlier and hides the weaker patterns in comparison
boxplot(markets$VAL_1334)
# boxplot.stats gives us the lower whisker or minimum, the interquartile range of the middle (the second and fourth values), the median (third value) and the upper whisker or maximum.
boxplot.stats(markets$VAL_1334)
## $stats
## [1] -1.000 0.000 40.000 83.915 205.000
##
## $n
## [1] 1947
##
## $conf
## [1] 36.99521 43.00479
##
## $out
## [1] 11000.00 1620.00 511.17 300.00 366.17 515.17 220.00 800.00
## [9] 300.00 261.38 358.83 370.50 540.75 1000.00 946.00 913.92
## [17] 604.54 265.44 270.00 466.09 645.25 292.50 227.36 382.88
## [25] 2200.00 280.50 360.00 293.06 1000.00 249.06 349.00 250.00
## [33] 242.50 266.67 210.00 245.75 600.30 390.12 213.19 270.00
## [41] 226.52 244.12 211.62 500.00 268.97 750.00 266.63 255.00
## [49] 333.33 240.52 255.00 262.30 266.63 330.00 409.50 260.00
## [57] 270.00 232.50 232.88 412.13 235.00 300.00 315.00 341.30
## [65] 750.08 398.15 357.61 232.50 221.25 285.00 270.00 1100.00
## [73] 211.04 210.00 495.00 375.00 250.00 225.00 210.00 480.00
## [81] 228.81 257.16 232.50 247.50 532.50 225.00 333.33 450.00
## [89] 450.00 225.00 400.65 315.00 240.00 220.75 247.42 412.50
## [97] 675.00
# Subset to get rid of the high outliers to illustrate "weaker" patterns, original data is saved as markets_backup
markets_backup <- markets
markets_major <- subset(markets, VAL_1334 > 205)
markets <- subset(markets, VAL_1334 < 205)
sp_markets <- as.ppp(coordinates(markets),as.owin(polyg))
# Recalclulating KDE
dens_weighted <- density(sp_markets, sigma=10000,
weights=markets$VAL_1334, edge=TRUE, eps=500)
plot(dens_weighted, col=topo.colors(50))
points(markets, pch=19, cex=0.1)
points(markets_major, pch=15, cex=0.7, col="red")
# Setting up
markets <- subset(markets_backup, DEFYR<1251)
sp_markets <- as.ppp(coordinates(markets),as.owin(polyg))
# Subsetting data
earlier <- as.ppp(coordinates(markets[markets$by1200=="1",]),as.owin(polyg))
later <- as.ppp(coordinates(markets[markets$by1200=="0",]),as.owin(polyg))
multit <- sp_markets
marks(multit) <- as.factor(markets$by1200)
# Plotting the new multitype object
plot(multit)
# Density estimate of all the markets by 1250
dens1250 <- density(sp_markets, sigma=25000, edge=TRUE,
eps=500)
plot(dens1250, col=topo.colors(50))
points(markets, pch=19, cex=0.2)
# Creating relative risk surface that contrasts earlier markets to those founded later
rrs <- relrisk(multit, sigma=25000, edge=TRUE, eps=500)
rrs[as.matrix(dens1250)<(0.000000001)] <- NA
dev.new(device=pdf, height=6, width=6)
plot(polyg)
plot(rrs, col=rev(topo.colors(50)), add=T)
plot(multit, cex=0.3, add=T)
# Plotting side by side still doesn't work in the webpage version... Might need to find other solutions for that!
# Distances between market sites and their closest neighbours
nndist(sp_markets)
## [1] 3560.8988 761.5773 11681.1814 12067.3112 6325.3458 2701.8512
## [7] 806.2258 10022.4747 5658.6217 10673.7997 806.2258 13928.3883
## [13] 11092.3397 6685.0580 12567.4182 3275.6679 11360.0176 2690.7248
## [19] 8237.7181 7060.4532 8202.4387 7723.3412 761.5773 2920.6164
## [25] 5885.5756 9762.1719 10478.5495 2002.4984 7021.3959 7140.0280
## [31] 4712.7487 11720.0683 1131.3708 3807.8866 9100.5494 5787.0545
## [37] 9420.1911 7125.3070 11562.4392 8417.2442 9501.5788 9264.9879
## [43] 7416.8727 12465.1514 10218.1212 9425.4973 3605.5513 14712.2398
## [49] 11260.9946 14090.0674 7864.4771 3492.8498 15781.3181 9167.8787
## [55] 10791.2001 4410.2154 9025.5194 7186.7934 14327.9447 7102.1124
## [61] 12745.5875 10332.4731 6224.1465 6573.4314 509.9020 509.9020
## [67] 806.2258 3757.6588 10720.0746 4494.4410 12149.8971 9827.0036
## [73] 10107.9177 8134.4945 1220.6556 7723.3412 943.3981 3244.9961
## [79] 7467.2619 4632.4939 4775.9816 10176.9347 2915.4759 5557.8773
## [85] 4904.0799 6854.1958 3966.1064 14489.9965 7496.6659 1878.8294
## [91] 17084.7886 4410.2154 3744.3290 8772.6849 4936.5980 9004.9986
## [97] 3640.0549 5408.3269 7605.9187 5366.5631 4967.8969 6888.3960
## [103] 22274.8737 9372.2996 2121.3203 5124.4512 4441.8465 6835.9345
## [109] 4472.1360 3189.0437 7061.1614 7655.7168 11315.9180 3944.6166
## [115] 3613.8622 6627.2166 4527.6926 9104.9437 6989.2775 6824.9542
## [121] 9712.8780 10381.7147 3448.1879 3512.8336 7831.3473 8130.1906
## [127] 6407.8077 10978.6156 3701.3511 1649.2423 3560.8988 6135.1447
## [133] 10252.8045 8198.7804 2236.0680 8099.3827 7657.6759 11101.8017
## [139] 4110.9610 6977.1054 6977.1054 14259.3829 3080.5844 3534.1194
## [145] 6859.3003 3534.1194 5964.0590 2816.0256 9272.0009 11800.4237
## [151] 7360.0272 2765.8633 6774.2158 10507.6163 7134.4236 8628.4413
## [157] 4393.1765 9687.6210 4967.8969 9700.0000 6264.9820 4272.0019
## [163] 2920.6164 943.3981 3584.6897 4327.8170 8415.4620 2002.4984
## [169] 12298.3739 11303.0969 6200.8064 5869.4122 7496.6659 2683.2816
## [175] 6529.9311 13000.0000 8249.2424 21226.8698 14534.4419 16949.6313
## [181] 14933.5193 2765.8633 4188.0783 5412.0237 10673.7997 6754.2579
## [187] 12067.3112 6573.4314 11884.8643 6835.9345 14222.5174 2716.6155
## [193] 7071.0678 7003.5705 5536.2442 11500.4348 7125.3070 9305.9121
## [199] 6768.3085 4916.2994 7580.2375 8490.5830 21497.9069 5420.3321
## [205] 728.0110 17415.5103 4600.0000 4707.4409 9904.5444 4272.0019
## [211] 11503.9124 3757.6588 6609.8411 1104.5361 5249.7619 9642.0952
## [217] 6603.7868 4775.9816 4775.9816 2408.3189 360.5551 9848.8578
## [223] 5161.3952 15305.2279 4300.0000 4632.4939 2900.0000 7071.0678
## [229] 8102.4688 6365.5322 5015.9745 9660.7453 5437.8304 9774.4565
## [235] 10495.7134 4161.7304 5770.6152 806.2258 11088.7330 6107.3726
## [241] 10785.6386 9394.1471 4494.4410 5656.8542 9035.4856 3006.6593
## [247] 10186.7561 5724.5087 7343.0239 6053.9243 6664.0828 5408.3269
## [253] 19780.0404 2915.4759 9386.1600 9702.0616 7782.0306 6529.9311
## [259] 11798.7287 2193.1712 5905.9292 10178.8997 2236.0680 4909.1751
## [265] 12523.9770 8854.3774 6964.1941 11500.4348 7337.5745 8731.5520
## [271] 3901.2818 6390.6181 2701.8512 4472.1360 16609.0337 14408.6779
## [277] 2961.4186 12520.7827 11884.8643 6242.5956 6977.8220 28142.6722
## [283] 15061.5404 11810.5885 3138.4710 18694.6516 2720.2941 7496.6659
## [289] 8246.2113 2408.3189 6325.3458 4785.3944 4770.7442 7580.2375
## [295] 2200.0000 16865.6456 5403.7024 14013.2081 10609.9010 9786.2148
## [301] 11961.6052 13914.0217 11403.9467 9848.8578 7655.7168 4272.0019
## [307] 11112.6055 12998.8461 22588.7140 7653.1039 12523.9770 6390.6181
## [313] 7375.6356 8202.4387 7134.4236 13805.7959 5643.5804 4580.3930
## [319] 9700.0000 10541.8215 3178.0497 3178.0497 5099.0195 11363.5382
## [325] 3701.3511 8127.1151 8357.0330 1878.8294 8381.5273 5142.9563
## [331] 2941.0882 18304.3711 2220.3603 6768.3085 2927.4562 7849.2038
## [337] 8183.5200 2662.7054 12870.1204 7343.7048 3436.5681 7140.0280
## [343] 3014.9627 11360.0176 11335.7840 10044.8992 5162.3638 6965.6299
## [349] 10609.9010 2435.1591 9209.7774 2927.4562 3612.4784 7905.6942
## [355] 14616.7712 3712.1422 4560.7017 7580.2375 3584.6897 13905.3946
## [361] 16949.6313 7725.2832 41618.3854 8324.0615 7273.2386 4327.8170
## [367] 6841.0526 8637.1292 12316.6554 3482.8150 8102.4688 3640.0549
## [373] 3613.8622 5458.9376 316.2278 11637.8692 1220.6556 2884.4410
## [379] 3569.3137 3080.5844 5124.4512 9135.0972 12093.3866 5787.0545
## [385] 5608.0300 10623.0881 3671.5120 15970.5980 2729.4688 1523.1546
## [391] 1523.1546 12851.4591 10978.6156 8183.5200 3940.8121 12349.0890
## [397] 7200.6944 4596.7380 6519.2024 14302.4473 8402.3806 8876.9364
## [403] 10920.1648 5280.1515 11800.4237 1615.5494 7849.2038 3275.6679
## [409] 4565.0849 4701.0637 6627.2166 6440.4969 3436.5681 6551.3357
## [415] 2941.0882 8489.9941 11313.7085 7720.1036 8345.0584 8292.7679
## [421] 1104.5361 6135.1447 10978.6156 2549.5098 2549.5098 9272.0009
## [427] 3492.8498 2683.2816 5015.9745 6860.0292 1941.6488 5813.7767
## [433] 5675.3854 2039.6078 20113.6769 17083.6179 15015.3255 7273.2386
## [439] 5590.1699 10867.3824 10347.9467 15970.5980 9775.4795 7000.0000
## [445] 10728.0007 6177.3781 7988.1162 9802.0406 6041.5230 13579.3962
## [451] 7060.4532 8854.3774 21384.5739 1131.3708 3512.8336 11205.8021
## [457] 6087.6925 6718.6308 7203.4714 2780.2878 7715.5687 5981.6386
## [463] 7209.0221 1581.1388 15781.3181 10466.1359 6685.0580 7576.9387
## [469] 7359.3478 6200.8064 8527.6022 6664.0828 8130.1906 728.0110
## [475] 4964.8766 7615.7731 7102.1124 3828.8379 5830.9519 5053.7115
## [481] 1615.5494 4909.1751 10016.9856 2961.4186 4441.8465 6791.1707
## [487] 7810.2497 4527.6926 6389.0531 12126.4174 1581.1388 1581.1388
## [493] 3569.3137 5408.3269 10400.4808 5841.2327 5984.1457 10245.9748
## [499] 2816.0256 1900.0000 3546.8296 5885.5756 2720.2941 6824.9542
## [505] 6519.2024 9300.5376 2376.9729 7839.0050 4301.1626 5508.1757
## [511] 5420.3321 13787.6757 3448.1879 10207.8401 5412.0237 2039.6078
## [517] 4743.4165 12103.7184 8640.0231 8099.3827 5456.1891 4104.8752
## [523] 4104.8752 7203.4714 3712.1422 7337.5745 1923.5384 5408.3269
## [529] 10241.5819 8895.5045 9617.6920 5000.0000 5215.3619 2561.2497
## [535] 10283.9681 10840.6642 5675.3854 6791.1707 806.2258 7690.2536
## [541] 4429.4469 8538.1497 3228.0025 3244.9961 10937.0928 4743.4165
## [547] 11360.0176 9482.6157 8998.8888 9386.1600 11101.8017 5590.1699
## [553] 7420.2426 2561.2497 2121.3203 5714.0179 7022.8199 3757.6588
## [559] 5536.2442 6801.4704 15652.4758 14241.1376 18298.9071 5661.2719
## [565] 4964.8766 7782.0306 9489.4678 2039.6078 5408.3269 11822.4363
## [571] 9712.8780 9167.8787 3000.0000 6365.5322 4785.3944 4617.3586
## [577] 8747.5711 7378.3467 8185.9636 5220.1533 6685.0580 8188.4064
## [583] 7022.8199 6403.1242 10793.5166 5688.5851 11415.7786 360.5551
## [589] 5714.0179 13173.4582 3689.1733 11381.1247 6942.6220 7217.3402
## [595] 2662.7054 7368.1748 7580.2375 6609.8411 3228.0025 10183.3197
## [601] 5280.1515 5124.4512 6618.9123 3676.9553 3860.0518 7702.5970
## [607] 8402.3806 3465.5447 4300.0000 7702.5970 3546.8296 4763.4021
## [613] 11415.7786 15911.3167 8122.1918 13914.0217 11303.0969 16804.7612
## [619] 15365.2205 7690.2536 8792.0419 7375.6356 10704.2048 11317.6853
## [625] 6841.0526 3448.1879 1581.1388 5930.4300 3982.4616 5508.1757
## [631] 10336.8274 6146.5437 2900.0000 7119.6910 7470.6091 4712.7487
## [637] 5824.0879 5608.0300 10245.9748 4560.7017 3275.6679 3275.6679
## [643] 6185.4668 7496.6659 9386.1600 6403.1242 500.0000 1923.5384
## [649] 9035.4856 4429.4469 6573.4314 12567.4182 9025.5194 3360.0595
## [655] 4924.4289 6107.3726 10212.2475 10555.0936 2941.0882 3569.3137
## [661] 2973.2137 7470.6091 7800.6410 6800.0000 4161.7304 10131.6336
## [667] 6596.9690 6920.2601 2061.5528 9276.3139 7200.6944 10430.7238
## [673] 6888.3960 8080.8415 6236.9865 3138.4710 6041.5230 4936.5980
## [679] 5948.1089 2563.2011 1649.2423 2563.2011 2729.4688 2884.4410
## [685] 5807.7534 5333.8541 11562.4392 7340.9809 9104.9437 5981.6386
## [691] 6306.3460 4272.0019 6920.2601 6573.4314 13978.9127 2720.2941
## [697] 11063.4533 12949.5174 5360.0373 11205.8021 19865.5481 5021.9518
## [703] 3275.6679 3901.2818 35913.5072 5142.9563 7420.2426 8628.4413
## [709] 3448.1879 5590.1699 7596.0516 2061.5528 3828.8379 4825.9714
## [715] 1252.9964 9420.1911 2376.9729 5700.8771 5053.7115 1252.9964
## [721] 11173.1822 5643.5804 8237.7181 8134.4945 5326.3496 6842.5142
## [727] 5246.9038 5824.0879 7831.3473 6802.9405 3689.1733 4580.3930
## [733] 7715.5687 5326.3496 7430.3432 4775.9816 3757.6588 7368.1748
## [739] 7494.6648 2039.6078 6640.7831 5458.9376 16275.4416 6000.0000
## [745] 8570.8809 9424.4363 9353.6089 4301.1626 11101.8017 500.0000
## [751] 3275.6679 4186.8843 6854.1958 5557.8773 5300.9433 13822.4455
## [757] 11086.0272 14241.1376 8122.1918 316.2278 4976.9469 9037.6988
## [763] 9213.0342 3982.4616 3257.2995 8421.9950 3257.2995 1513.2746
## [769] 6706.7131 9425.4973 10751.7440 3671.5120 10183.3197 5408.3269
## [775] 6989.2775 3014.9627 5948.1089 6965.6299 2780.2878 3080.5844
## [781] 4850.7731 7340.9809 5787.0545 3966.1064 9533.6247 6242.5956
## [787] 1941.6488 5161.3952 3023.2433 5916.9249 5021.9518 6389.0531
## [793] 5408.3269 9811.2181 9757.5612 5220.1533 6905.0706 8141.2530
## [799] 7119.6910 9771.8985 6264.9820 5787.0545 10867.3824 5000.0000
## [805] 4565.0849 14489.9965 4294.1821 11652.0384 4850.7731 1900.0000
## [811] 3577.7088 3482.8150 5162.3638 6276.9419 7627.5815 10589.1454
## [817] 10530.9069 6685.0580 3944.6166 7061.1614 2823.1188 7209.0221
## [823] 2720.2941 13225.7325 8236.5041 2435.1591 7343.0239 4712.7487
## [829] 7864.4771 8772.6849 8998.8888 7430.3432 6529.9311 7988.1162
## [835] 2823.1188 806.2258 4527.6926 4976.9469 3006.6593 9761.6597
## [841] 4770.7442 11926.8604 6224.1465 7420.2426 6964.1941 8598.2556
## [847] 5099.0195 11264.1023 4382.9214 3744.3290 1513.2746 4596.7380
## [853] 13905.3946 8631.3383 6946.2220 6000.0000 9104.9437 11573.2450
## [859] 3828.8379 2941.0882 5333.8541 3080.5844 7905.6942 3577.7088
## [865] 3605.5513 6802.9405 3676.9553 6888.3960 2193.1712 9633.7947
## [871] 2927.4562 8229.8238 12480.7852 14449.9135 25784.1036 4904.0799
## [877] 4904.0799 4924.4289 6029.9254 3023.2433 3006.6593 21384.5739
## [883] 9353.6089
# Visualizing the data and taking the mean
hist(nndist(sp_markets), breaks=seq(0, 50000, 1000))
mean(nndist(sp_markets))
## [1] 7152.382
# Clark and Evans test:
# R value of 1.08 tells the sites are slightly dispersed, and p << 0.05 tells us that the result is statistically significant
clarkevans.test(sp_markets, corrections="none")
##
## Clark-Evans test
## No edge correction
## Z-test
##
## data: sp_markets
## R = 1.0819, p-value = 3.206e-06
## alternative hypothesis: two-sided
# Calling back the "markets as events" data from earlier, creating a subset of those that were recorded before 1250
# R value of 0.96 tells us that the data is slightly clustered
market_events1250 <- subset(market_events, DEFYR<1251)
sp_market_events <-
as.ppp(coordinates(market_events1250),as.owin(polyg))
hist(nndist(sp_market_events), breaks=seq(0, 50000, 1000))
mean(nndist(sp_market_events))
## [1] 6117.172
clarkevans.test(sp_market_events, corrections="none")
##
## Clark-Evans test
## No edge correction
## Z-test
##
## data: sp_market_events
## R = 0.96332, p-value = 0.02997
## alternative hypothesis: two-sided
# This final code is not needed in RStudio or the GitHub webpage
# save.image("week2_sade.RData")
# Setting up
setwd("~/R/SADE/week3")
library(rgdal)
library(raster)
library(spatstat)
library(maptools)
library(GISTools)
# Creating example data for Complete Spatial Randomness (CRS)
# Every time "plot" command is run, the pattern changes to a newly generated random distribution
window <- owin(c(0,10000), c(0,10000))
plot(runifpoint(n=1000, win=window))
# Setting seed allows everyone to see the same pattern
set.seed(1)
plot(runifpoint(n=1000, win=window))
# Loading the study area (a 10 x 10 grid)
grid <- readOGR(dsn="grid", layer="grid")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\Uine\Documents\R\SADE\week3\grid", layer: "grid"
## with 100 features
## It has 1 fields
plot(grid)
# Plotting the earlier random distribution to this grid illustrates that there is varying amounts of points in each cell (Poisson distribution)
set.seed(1)
randomp <- as.SpatialPoints.ppp(runifpoint(n=1000, win=grid))
crs(randomp) <- crs(proj4string(grid))
points(randomp, cex=0.5)
# Calculating the number of points in each cell and mean value of points per cell, and plotting histogram to illustrate the distribution
hist(poly.counts(randomp,grid), xlim=c(0, 25))
mean(poly.counts(randomp,grid))
## [1] 10
# Creating a curve for idealised distribution for mean = 10
plot(dpois(x=0:20, lambda=10))
lines(dpois(x=0:20, lambda=10), col="blue")
# Loading the exercise data: a map depicting "nucleated" settlements (e.g. villages and towns)
polyg <- readOGR(dsn="england", layer="england_historic")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\Uine\Documents\R\SADE\week3\england", layer: "england_historic"
## with 1 features
## It has 3 fields
dem <- raster("dem_england/dem_england_historic.tif")
settl <- readOGR(dsn="nucleations", layer="Nucleations")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\Uine\Documents\R\SADE\week3\nucleations", layer: "Nucleations"
## with 10513 features
## It has 3 fields
plot(polyg)
plot(dem, add=T, col=terrain.colors(10))
points(settl, pch=19, cex=0.1)
# Omitting points that fall outside of the polygon and converting the data into spastat object
settl <- settl[polyg, ]
sp_settl <- as.ppp(coordinates(settl),as.owin(polyg))
# Histogram of distribution of distances and mean value
hist(nndist(sp_settl), xlim=c(0,6000), breaks=100)
mean(nndist(sp_settl))
## [1] 2059.87
# Clark and Evans test
# R > 1 (and p << 0.05) means that the sites are dispersed
clarkevans.test(sp_settl, corrections="all")
##
## Clark-Evans test
## No edge correction
## Z-test
##
## data: sp_settl
## R = 1.1709, p-value < 2.2e-16
## alternative hypothesis: two-sided
# RECALLING PREVIOUS POLYGON TO THIS R CHUNK, OTHERWISE ADDING THE NEW SUBAREAS DIDN'T WORK
polyg <- readOGR(dsn="england", layer="england_historic")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\Uine\Documents\R\SADE\week3\england", layer: "england_historic"
## with 1 features
## It has 3 fields
dem <- raster("dem_england/dem_england_historic.tif")
settl <- readOGR(dsn="nucleations", layer="Nucleations")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\Uine\Documents\R\SADE\week3\nucleations", layer: "Nucleations"
## with 10513 features
## It has 3 fields
plot(polyg)
plot(dem, add=T, col=terrain.colors(10))
points(settl, pch=19, cex=0.1)
settl <- settl[polyg, ]
sp_settl <- as.ppp(coordinates(settl),as.owin(polyg))
# Loading two polygons describing the new study areas
south <- readOGR(dsn="southernengland", layer="eng_south")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\Uine\Documents\R\SADE\week3\southernengland", layer: "eng_south"
## with 1 features
## It has 1 fields
mid <- readOGR(dsn="midlands", layer="midlands")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\Uine\Documents\R\SADE\week3\midlands", layer: "midlands"
## with 1 features
## It has 1 fields
plot(south, border="red", add=T)
plot(mid, border="cyan", add=T)
# Creating objects that contain settlements that fall into these regions & creating spastat objects
south_settl <- settl[south, ]
sp_south_settl <- as.ppp(coordinates(south_settl),as.owin(south))
points(sp_south_settl, pch=19, cex=0.1, col="red")
mid_settl <- settl[mid, ]
sp_mid_settl <- as.ppp(coordinates(mid_settl),as.owin(mid))
points(sp_mid_settl, pch=19, cex=0.1, col="cyan")
# Plotting the point distributions
plot(south, main="Avon region")
plot(dem, add=T, col=terrain.colors(20))
points(sp_south_settl, pch=19, cex=0.5)
plot(mid, main="Midlands")
plot(dem, add=T, col=terrain.colors(20))
points(sp_mid_settl, pch=19, cex=0.5)
# Running Clark & Evants test
# Visually we can see that there is clustering of settlements near the waterways in "South", so the "dispersed" (R > 1) result for both of them seems not to capture the full picture
clarkevans.test(sp_south_settl, correction="none")
##
## Clark-Evans test
## No edge correction
## Z-test
##
## data: sp_south_settl
## R = 1.1343, p-value = 5.336e-12
## alternative hypothesis: two-sided
clarkevans.test(sp_mid_settl, correction="none")
##
## Clark-Evans test
## No edge correction
## Z-test
##
## data: sp_mid_settl
## R = 1.328, p-value < 2.2e-16
## alternative hypothesis: two-sided
# K function
# "Pois" is the expected Poisson line, the others are how the data compares to it with a few different edge corrections
k_func_south <- Kest(sp_south_settl)
dev.new(device=pdf)
plot(k_func_south, xlim=c(0,5000), main="K-Function South close-up")
# L function (Same as K but Pois is straightened)
l_func_south <- Lest(sp_south_settl)
plot(l_func_south, xlim=c(0,5000))
# PCF (pair correlation function)
# Same principle as K, but different process ("donut rings")
# Expected Poisson is the green horizontal line
pc_func_south <- pcf(sp_south_settl)
plot(pc_func_south, xlim=c(0,5000))
# Monte Carlo Simulation
# The grey envelope depicts results of random point patterns simulations in the study area, the line shows where our data deviates from this area of random distribution
pc_func_100_south <- envelope(sp_south_settl, pcf, nsim=99)
## Generating 99 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
## 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99.
##
## Done.
plot(pc_func_100_south, xlim=c(0,20000), main="South: PCF with 99 MC Simulations")
# Refined simulation with 999 simulations, and top and bottom 25 % removed in case of statistical outliers
# REDUCED nsim to 99 for the GitHub site as it takes too long to calculate again and again when knitting the html page
pc_func_1000_south <- envelope(sp_south_settl, pcf, nsim=99,
nrank=25)
## Generating 99 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
## 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99.
##
## Done.
plot(pc_func_1000_south, xlim=c(0,20000), main="South: PCF with 999 MC Simulations")
# Comparing to Midlands data: R value barely gets to random pattern envelope, settlement patterns seem to be not clustered
pc_func_1000_mid <- envelope(sp_mid_settl, pcf, nsim=99,
nrank=25)
## Generating 99 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
## 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99.
##
## Done.
plot(pc_func_1000_mid, xlim=c(0,5000), main="Midlands: PCF with 999 MC Simulations")
# Loading data: location of Oxford and Oxford pottery found
sites <- readOGR(dsn="pottery_sites", layer="pottery_sites")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\Uine\Documents\R\SADE\week3\pottery_sites", layer: "pottery_sites"
## with 30 features
## It has 3 fields
oxford <- readOGR(dsn="oxford/oxford.shp", layer="oxford")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\Uine\Documents\R\SADE\week3\oxford\oxford.shp", layer: "oxford"
## with 1 features
## It has 2 fields
## Integer64 fields read as strings: cat Id
plot(polyg)
points(oxford, pch=15, cex=2)
points(sites)
# Inspecting site names
head(sites, n=30)
## site_name oxpots id
## 1 Wroxeter 5.50 1
## 2 Droitwhich 10.75 2
## 3 Leicester 7.50 3
## 4 Gloucester 21.00 4
## 5 Caerwent 19.00 5
## 6 Gatcombe 21.00 6
## 7 Bath 21.25 7
## 8 Cirencester 22.50 8
## 9 Alchester 22.50 9
## 10 Verulamium 17.25 10
## 11 Dorchester on Thames 22.50 11
## 12 Mildenhall 17.50 12
## 13 Silchester 19.50 13
## 14 London 18.25 14
## 15 Ilchester 12.00 15
## 16 Durrington 11.50 16
## 17 Salisbury 1.50 17
## 18 East Anton 14.75 18
## 19 Winchester 8.50 19
## 20 Clausentum 6.25 20
## 21 Dorchester 5.85 21
## 22 Portchester 9.00 22
## 23 Chichester 6.75 23
## 24 Canterbury 17.50 24
## 25 Richborough 17.50 25
## 26 Colchester 7.00 26
## 27 Caister 4.00 27
## 28 Norwhich 4.00 28
## 29 Pevensey 7.75 29
## 30 Brough on Humber 1.50 30
# Dependant variable = the proportion of Oxford pottery at each sites (oxpots)
# Independent variable = distance of the site from Oxford
# Calculating these distances:
# Step 1: combine coordinate data to single list
coords <- rbind(coordinates(sites), coordinates(oxford))
# Step 2: distance matrix between sites
d_matrix <- as.matrix(dist(coords))
# Step 3: Adding "distox" as a column to "sites" data frame
sites$distox <- d_matrix[31,1:30]
# Scatter plot
plot(sites$distox,sites$oxpots)
# Linear regression analysis (lm) and adding the result to the scatterplot
res <- lm(sites$oxpots ~ sites$distox)
abline(res, col="red")
# Summary of the linear regression analysis results
summary(res)
##
## Call:
## lm(formula = sites$oxpots ~ sites$distox)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.9931 -4.0310 -0.6302 3.5874 10.8196
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.061e+01 2.261e+00 9.116 7.13e-10 ***
## sites$distox -7.490e-05 1.904e-05 -3.933 0.000503 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.693 on 28 degrees of freedom
## Multiple R-squared: 0.3559, Adjusted R-squared: 0.3329
## F-statistic: 15.47 on 1 and 28 DF, p-value: 0.0005027
# Residuals: adding new column to "sites"
sites$sr <- residuals(res) / summary(res)$sigma
# Taking a look at results with boxplot & scatterplot
boxplot(sites$sr)
plot(sites$distox,sites$sr)
# Mapping the residual values
plot(polyg)
points(oxford, pch=15, cex=2)
points(sites[sites$sr >=-3 & sites$sr <=-2,], pch=1,
cex=3, col="blue")
points(sites[sites$sr >=-2 & sites$sr <=-1,], pch=1,
cex=2, col="blue")
points(sites[sites$sr >=-1 & sites$sr <=0,], pch=1,
cex=1, col="blue")
points(sites[sites$sr >=0 & sites$sr <=1,], pch=1,
cex=1, col="red")
points(sites[sites$sr >=1 & sites$sr <=2,], pch=1,
cex=2, col="red")
# Setting working directory
setwd("~/R/SADE/week4")
# Setting up a toy dataset
soils <- c("clay", "morainic", "peat")
monuments <- c(21, 7, 24)
pct_land <- c(0.45, 0.25, 0.3)
mydata <- cbind.data.frame(soils, monuments, pct_land)
mydata
## soils monuments pct_land
## 1 clay 21 0.45
## 2 morainic 7 0.25
## 3 peat 24 0.30
# One-sample Chi-squared test
# Null hypothesis is that there is no significant relationship between the distribution of the monuments and the soil type areas they are found in
# Step 1: create column with the expected nr of monuments per soil area
mydata$expected <- mydata$pct_land * sum(mydata$monuments)
mydata
## soils monuments pct_land expected
## 1 clay 21 0.45 23.4
## 2 morainic 7 0.25 13.0
## 3 peat 24 0.30 15.6
# Step 2: Chi-squared test
chisq.test(mydata$monuments, p=mydata$pct_land)
##
## Chi-squared test for given probabilities
##
## data: mydata$monuments
## X-squared = 7.5385, df = 2, p-value = 0.02307
# X-squared = 7.5385, df = 2, p-value = 0.02307
# -> We can reject the null hypothesis (of the distribution being random) with 97.7 % confidence
# Two-sample Chi-squared test
soils <- c("clay", "morainic", "peat")
houses <- c(6, 3, 19)
manu <- c(10, 3, 2)
ritual <- c(5, 1, 3)
pct_land <- c(0.45, 0.25, 0.3)
newdata <- cbind.data.frame(soils, houses, manu, ritual,
pct_land)
newdata
## soils houses manu ritual pct_land
## 1 clay 6 10 5 0.45
## 2 morainic 3 3 1 0.25
## 3 peat 19 2 3 0.30
newdata2 <- data.frame(newdata$houses, newdata$manu,
newdata$ritual)
chisq.test(newdata2)
##
## Pearson's Chi-squared test
##
## data: newdata2
## X-squared = 12.919, df = 4, p-value = 0.01168
# X-squared = 12.919, df = 4, p-value = 0.01168
# p < 0.05, so we can reject null hypothesis
# Setting up
# Note: had to replace row.names="hordes" to row.names=1 for the code to work
library(ca)
mydata <- read.csv(file="romanhoards.csv", header=TRUE, sep=",", row.names=1)
summary(mydata)
## Antioch Alexandria Aquileia Arles
## Min. : 0.00 Min. : 0.00 Min. : 3.0 Min. : 3.00
## 1st Qu.: 0.25 1st Qu.: 0.00 1st Qu.: 4.5 1st Qu.: 5.00
## Median : 43.50 Median : 2.50 Median : 6.5 Median : 24.50
## Mean : 89.50 Mean : 9.00 Mean : 32.0 Mean : 70.83
## 3rd Qu.:118.25 3rd Qu.:19.25 3rd Qu.: 16.0 3rd Qu.:116.00
## Max. :321.00 Max. :25.00 Max. :153.0 Max. :228.00
## Constantinopolis Cyzicus Heracleia London
## Min. : 0.00 Min. : 1.00 Min. : 0.0 Min. : 1
## 1st Qu.: 0.00 1st Qu.: 7.75 1st Qu.: 7.5 1st Qu.: 1
## Median : 0.50 Median : 22.50 Median : 21.5 Median :18
## Mean : 44.67 Mean : 208.00 Mean : 213.8 Mean :27
## 3rd Qu.: 23.50 3rd Qu.: 89.75 3rd Qu.: 53.5 3rd Qu.:44
## Max. :236.00 Max. :1087.00 Max. :1173.0 Max. :77
## Lugdunum Nicomedia Rome Ostia
## Min. : 0.00 Min. : 2.00 Min. : 2.00 Min. :0.0000
## 1st Qu.: 0.00 1st Qu.: 5.25 1st Qu.: 14.00 1st Qu.:0.0000
## Median :21.00 Median : 12.50 Median : 28.00 Median :0.0000
## Mean :31.67 Mean : 92.00 Mean : 57.67 Mean :0.1667
## 3rd Qu.:63.75 3rd Qu.: 76.75 3rd Qu.: 35.25 3rd Qu.:0.0000
## Max. :77.00 Max. :424.00 Max. :241.00 Max. :1.0000
## Siscia Sirmium Thessalonica Ticinum
## Min. : 9.00 Min. : 0.0 Min. : 10.0 Min. : 4.0
## 1st Qu.: 14.75 1st Qu.: 0.0 1st Qu.: 18.0 1st Qu.: 7.5
## Median : 30.00 Median : 1.0 Median : 21.5 Median : 18.5
## Mean : 714.00 Mean :14.5 Mean : 311.5 Mean :109.3
## 3rd Qu.: 42.25 3rd Qu.: 2.0 3rd Qu.: 32.5 3rd Qu.: 73.0
## Max. :4159.00 Max. :83.0 Max. :1763.0 Max. :520.0
## Trier uncertain Total
## Min. : 0.00 Min. : 0.00 Min. : 74.0
## 1st Qu.: 0.25 1st Qu.: 0.00 1st Qu.: 328.8
## Median :105.00 Median : 5.50 Median : 723.5
## Mean :175.33 Mean :26.00 Mean : 2227.0
## 3rd Qu.:333.50 3rd Qu.:52.25 3rd Qu.: 975.0
## Max. :467.00 Max. :79.00 Max. :10585.0
# Using t() to flip rows to columns
t.mydata <- t(mydata)
t.mydata
## Tavistock Sq. Chavannes Nagyteteny Bulgaria Jezzine Nebek
## Antioch 0 0 129 1 321 86
## Alexandria 0 0 24 0 25 5
## Aquileia 7 19 153 3 6 4
## Arles 41 141 228 3 8 4
## Constantinopolis 1 0 236 0 31 0
## Cyzicus 4 1 1087 19 111 26
## Heracleia 5 0 1173 15 62 28
## London 47 77 35 1 1 1
## Lugdunum 71 77 42 0 0 0
## Nicomedia 5 2 424 6 96 19
## Rome 26 30 241 2 37 10
## Ostia 0 0 0 0 0 1
## Siscia 31 46 4159 9 29 10
## Sirmium 0 2 83 0 2 0
## Thessalonica 21 22 1763 10 36 17
## Ticinum 22 90 520 4 15 5
## Trier 375 467 209 1 0 0
## uncertain 11 66 79 0 0 0
## Total 667 1040 10585 74 780 216
# Removing the "total" row from the analysis data
t.mydata <- t.mydata[-c(19),]
summary(t.mydata)
## Tavistock Sq. Chavannes Nagyteteny Bulgaria
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.000
## 1st Qu.: 1.75 1st Qu.: 0.25 1st Qu.: 80.0 1st Qu.: 0.000
## Median : 9.00 Median : 20.50 Median : 218.5 Median : 1.500
## Mean : 37.06 Mean : 57.78 Mean : 588.1 Mean : 4.111
## 3rd Qu.: 29.75 3rd Qu.: 74.25 3rd Qu.: 496.0 3rd Qu.: 5.500
## Max. :375.00 Max. :467.00 Max. :4159.0 Max. :19.000
## Jezzine Nebek
## Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.25 1st Qu.: 0.25
## Median : 20.00 Median : 4.50
## Mean : 43.33 Mean :12.00
## 3rd Qu.: 36.75 3rd Qu.:15.25
## Max. :321.00 Max. :86.00
# CA from the data, excluding the "uncertain" and "Total" columns
mydata.ca <- ca(mydata[,1:17])
summary(mydata.ca)
##
## Principal inertias (eigenvalues):
##
## dim value % cum% scree plot
## 1 0.560747 61.8 61.8 ***************
## 2 0.325160 35.8 97.6 *********
## 3 0.012822 1.4 99.0
## 4 0.005746 0.6 99.6
## 5 0.003557 0.4 100.0
## -------- -----
## Total: 0.908032 100.0
##
##
## Rows:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | TvsS | 50 969 250 | -2093 960 388 | 206 9 6 |
## 2 | Chvn | 74 980 303 | -1904 973 477 | 155 6 5 |
## 3 | Ngyt | 796 1000 94 | 251 584 89 | -212 416 109 |
## 4 | Blgr | 6 77 4 | 235 76 1 | -15 0 0 |
## 5 | Jzzn | 59 995 277 | 590 82 37 | 1972 914 706 |
## 6 | Nebk | 16 930 72 | 545 74 9 | 1850 855 172 |
##
## Columns:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | Antc | 41 999 300 | 669 67 32 | 2498 932 780 |
## 2 | Alxn | 4 986 15 | 581 99 2 | 1737 887 38 |
## 3 | Aqul | 15 186 1 | -42 40 0 | -80 146 0 |
## 4 | Arls | 32 780 38 | -910 780 48 | 21 0 0 |
## 5 | Cnst | 20 727 5 | 375 699 5 | 75 28 0 |
## 6 | Cyzc | 95 900 16 | 370 882 23 | 53 18 1 |
## 7 | Hrcl | 97 933 15 | 353 863 22 | -100 70 3 |
## 8 | Lndn | 12 990 52 | -1936 980 82 | 195 10 1 |
## 9 | Lgdn | 14 983 65 | -2001 977 103 | 163 6 1 |
## 10 | Ncmd | 42 966 16 | 388 431 11 | 432 535 24 |
## 11 | Rome | 26 966 2 | -90 107 0 | 256 859 5 |
## 12 | Osti | 0 184 5 | 728 9 0 | 3245 175 2 |
## 13 | Sisc | 324 983 68 | 285 429 47 | -324 554 105 |
## 14 | Srmm | 7 934 1 | 279 485 1 | -268 449 1 |
## 15 | Thss | 142 990 22 | 278 554 19 | -247 437 26 |
## 16 | Tcnm | 50 496 5 | -152 266 2 | -141 230 3 |
## 17 | Trir | 80 997 375 | -2058 990 602 | 175 7 8 |
# The first two dimensions explain cumulatively 97.6 % of the variation
# So we can quite comfortably plot the data into 2-dimensional graph!
plot(mydata.ca)
# Plotting just the rows or just the columns with "what" argument
plot(mydata.ca, what=c("none","all"), main="Columns")
plot(mydata.ca, what=c("all","none"), main="Rows")
# Selecting which CA dimensions to plot to see it from different angles
plot(mydata.ca, dim=c(1,3))
# Reading data (had to tweak the row.names argument again - apparently it got translated to "ï..pottery" instead of "pottery")
baxter <- read.csv(file="baxter_burials.csv", header=TRUE, sep=",", row.names="ï..pottery")
# Exploring the data
structure(baxter)
## a b c d e f g h i j k l m n o p
## 1 0 0 0 0 0 0 2 2 1 1 0 0 0 1 5 3
## 2 0 0 0 0 0 0 1 0 0 1 1 7 0 0 0 0
## 3 0 2 0 0 0 1 0 1 0 0 0 0 0 0 0 0
## 4 0 8 0 2 3 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0
## 6 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0
## 7 0 4 0 0 2 0 0 1 0 0 0 0 1 0 2 0
## 8 0 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0
## 9 0 7 6 1 2 0 0 0 0 1 3 3 0 0 1 1
## 10 3 6 1 1 3 3 2 6 2 2 12 5 2 1 2 1
## 11 0 8 0 4 4 0 0 0 0 0 0 0 0 0 0 0
## 12 1 1 0 0 0 0 0 2 0 0 1 0 0 0 0 0
## 13 0 2 12 0 0 0 1 0 0 0 1 0 0 0 0 0
## 14 0 1 0 0 0 0 0 2 0 0 2 0 0 0 0 0
## 15 0 0 0 0 0 0 1 1 1 1 1 0 7 3 4 2
## 16 1 1 0 0 0 1 2 0 1 0 4 1 9 4 0 0
## 17 0 0 0 0 0 0 0 0 0 0 7 0 0 0 1 0
## 18 2 2 0 0 0 1 0 0 0 0 3 0 0 0 0 0
## 19 0 0 0 0 0 0 0 1 3 1 0 2 3 2 7 0
## 20 0 5 0 0 2 0 0 0 0 0 1 0 0 0 0 0
## 21 0 0 0 0 0 0 1 0 0 0 2 0 1 0 4 0
## 22 1 0 0 0 0 2 0 0 0 0 0 1 0 0 0 0
## 23 0 1 0 3 0 0 0 0 0 0 0 0 0 0 0 0
## 24 0 3 0 0 3 0 0 0 0 0 0 0 0 0 0 0
## 25 1 8 26 3 1 0 0 0 0 0 0 0 0 0 0 1
## 26 0 0 0 0 0 0 0 2 1 2 0 0 13 1 4 4
## 27 1 4 0 0 0 2 0 1 2 4 14 2 1 3 0 0
## 28 1 1 13 1 0 2 0 0 0 0 0 0 0 0 0 0
## 29 0 3 14 0 0 0 1 0 0 0 1 0 0 0 0 0
## 30 1 6 0 0 0 1 1 1 0 0 4 0 0 0 1 1
## 31 10 14 2 0 0 11 6 16 3 5 24 27 4 5 2 2
## 32 1 0 0 0 0 3 0 0 0 0 0 1 0 0 0 0
## 33 0 0 0 0 0 0 0 0 0 1 5 2 0 0 0 0
## 34 0 4 0 0 0 1 0 0 0 0 4 1 2 1 7 0
## 35 4 5 0 0 0 1 0 1 0 0 3 0 0 0 0 0
## 36 0 0 0 0 0 0 1 0 1 1 1 0 0 2 2 1
## 37 0 0 0 0 0 0 0 0 3 0 0 1 1 4 0 1
## 38 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0
## 39 2 13 0 2 2 5 7 5 3 5 15 5 0 4 6 3
## 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 1
## 41 5 9 3 4 0 5 7 7 11 9 13 13 14 8 11 12
## 42 0 1 0 0 0 0 0 0 0 1 0 0 8 0 1 1
## 43 0 0 0 0 0 0 1 2 1 2 0 0 0 0 0 0
## 44 0 0 0 0 0 0 1 5 0 1 0 3 0 0 0 0
## 45 0 0 1 0 0 1 0 2 1 1 2 2 0 0 0 0
## 46 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0
## 47 0 5 0 0 1 2 0 0 0 0 2 0 0 0 2 0
## 48 1 3 1 0 1 5 2 5 0 0 9 2 0 0 0 1
## 49 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0
## 50 1 7 0 1 1 3 1 4 3 3 9 2 1 2 4 4
## 51 3 12 0 0 2 1 0 7 1 1 1 2 0 0 0 0
## 52 0 1 0 0 0 1 1 2 0 0 3 0 0 0 0 0
boxplot(baxter)
# CA for the data
# In the plots (drawn from the first three dimensions) burials are shown as red triangles and pottery types as blue dots
baxter.ca <- ca(baxter)
summary(baxter.ca)
##
## Principal inertias (eigenvalues):
##
## dim value % cum% scree plot
## 1 0.604014 27.7 27.7 *******
## 2 0.375853 17.2 44.9 ****
## 3 0.292531 13.4 58.4 ***
## 4 0.152558 7.0 65.4 **
## 5 0.151621 7.0 72.3 **
## 6 0.125624 5.8 78.1 *
## 7 0.105033 4.8 82.9 *
## 8 0.090072 4.1 87.0 *
## 9 0.082453 3.8 90.8 *
## 10 0.051150 2.3 93.1 *
## 11 0.042659 2.0 95.1
## 12 0.036629 1.7 96.8
## 13 0.027803 1.3 98.1
## 14 0.023567 1.1 99.1
## 15 0.018914 0.9 100.0
## -------- -----
## Total: 2.180481 100.0
##
##
## Rows:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | 1 | 15 279 18 | 544 110 7 | -674 169 18 |
## 2 | 2 | 10 34 25 | 380 26 2 | 214 8 1 |
## 3 | 3 | 4 245 5 | -24 0 0 | 820 245 7 |
## 4 | 4 | 13 433 27 | -477 49 5 | 1334 383 60 |
## 5 | 5 | 2 71 10 | -337 10 0 | 830 61 4 |
## 6 | 6 | 2 72 5 | 260 12 0 | 569 60 2 |
## 7 | 7 | 10 94 11 | 26 0 0 | 484 94 6 |
## 8 | 8 | 4 320 7 | -1027 288 7 | 341 32 1 |
## 9 | 9 | 24 680 11 | -777 625 24 | 231 55 3 |
## 10 | 10 | 51 588 5 | 168 130 2 | 314 458 13 |
## 11 | 11 | 16 376 41 | -542 51 8 | 1366 324 78 |
## 12 | 12 | 5 166 6 | 182 12 0 | 650 154 5 |
## 13 | 13 | 16 953 46 | -2359 867 144 | -745 86 23 |
## 14 | 14 | 5 166 6 | 222 20 0 | 606 146 5 |
## 15 | 15 | 21 805 21 | 666 203 15 | -1147 602 72 |
## 16 | 16 | 23 398 25 | 558 132 12 | -792 266 39 |
## 17 | 17 | 8 49 16 | 367 31 2 | 284 18 2 |
## 18 | 18 | 8 216 8 | 89 3 0 | 702 213 10 |
## 19 | 19 | 19 495 21 | 646 166 13 | -909 329 41 |
## 20 | 20 | 8 404 15 | -296 21 1 | 1264 383 33 |
## 21 | 21 | 8 223 13 | 563 85 4 | -715 138 11 |
## 22 | 22 | 4 63 11 | 223 8 0 | 582 55 4 |
## 23 | 23 | 4 86 42 | -834 30 5 | 1150 57 14 |
## 24 | 24 | 6 272 27 | -411 17 2 | 1613 255 41 |
## 25 | 25 | 39 994 89 | -2182 958 308 | -427 37 19 |
## 26 | 26 | 26 628 45 | 705 134 22 | -1353 494 129 |
## 27 | 27 | 33 141 15 | 327 106 6 | 188 35 3 |
## 28 | 28 | 18 936 50 | -2314 873 156 | -622 63 18 |
## 29 | 29 | 19 958 53 | -2335 880 168 | -697 78 24 |
## 30 | 30 | 16 276 6 | 88 9 0 | 485 267 10 |
## 31 | 31 | 128 252 28 | 247 127 13 | 245 125 21 |
## 32 | 32 | 5 55 17 | 218 6 0 | 597 48 5 |
## 33 | 33 | 8 89 10 | 362 47 2 | 343 42 2 |
## 34 | 34 | 20 137 15 | 379 88 5 | -280 48 4 |
## 35 | 35 | 14 234 16 | 22 0 0 | 764 234 21 |
## 36 | 36 | 9 334 9 | 560 137 5 | -669 196 10 |
## 37 | 37 | 10 211 26 | 645 71 7 | -903 140 21 |
## 38 | 38 | 2 42 22 | 641 17 1 | -780 25 3 |
## 39 | 39 | 75 395 9 | 228 200 6 | 225 195 10 |
## 40 | 40 | 4 176 16 | 630 45 3 | -1079 132 12 |
## 41 | 41 | 128 638 19 | 320 314 22 | -325 325 36 |
## 42 | 42 | 12 424 34 | 683 74 9 | -1482 350 69 |
## 43 | 43 | 6 51 12 | 469 48 2 | -101 2 0 |
## 44 | 44 | 10 62 18 | 400 40 3 | 293 22 2 |
## 45 | 45 | 10 11 5 | 44 2 0 | 100 9 0 |
## 46 | 46 | 2 72 5 | 260 12 0 | 569 60 2 |
## 47 | 47 | 12 251 9 | 14 0 0 | 629 251 12 |
## 48 | 48 | 29 270 12 | 109 14 1 | 470 256 17 |
## 49 | 49 | 2 77 8 | 492 27 1 | -672 50 2 |
## 50 | 50 | 45 336 5 | 279 300 6 | 96 36 1 |
## 51 | 51 | 29 432 19 | 17 0 0 | 779 432 47 |
## 52 | 52 | 8 187 6 | 243 37 1 | 490 150 5 |
##
## Columns:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | a | 38 112 30 | 93 5 1 | 428 107 19 |
## 2 | b | 145 552 60 | -271 82 18 | 650 470 163 |
## 3 | c | 78 984 237 | -2440 903 772 | -729 81 111 |
## 4 | d | 23 148 82 | -774 79 23 | 723 69 33 |
## 5 | e | 26 333 69 | -368 24 6 | 1328 310 124 |
## 6 | f | 53 103 43 | 153 13 2 | 402 90 23 |
## 7 | g | 39 70 21 | 242 50 4 | -154 20 2 |
## 8 | h | 74 154 43 | 316 79 12 | 310 75 19 |
## 9 | i | 39 191 45 | 498 99 16 | -478 92 24 |
## 10 | j | 42 206 19 | 407 172 12 | -180 34 4 |
## 11 | k | 148 181 56 | 251 76 15 | 295 105 34 |
## 12 | l | 81 87 53 | 293 60 12 | 193 26 8 |
## 13 | m | 66 569 107 | 665 124 48 | -1258 444 276 |
## 14 | n | 40 345 35 | 542 155 19 | -601 190 38 |
## 15 | o | 68 306 74 | 522 115 31 | -671 190 82 |
## 16 | p | 38 372 26 | 391 103 10 | -633 270 41 |
plot(baxter.ca)
plot(baxter.ca, dim=c(1,3))
plot(baxter.ca, dim=c(2,3))
# Removing row c as an outlier
baxter.ca <- ca(baxter[,-c(3)])
plot(baxter.ca)
# Setting up and renaming "ï..height" back to "height"
setwd("~/R/SADE/week5")
library(cluster)
peacock<-read.csv(file="peacock.csv", header=TRUE, sep=",")
colnames(peacock)[colnames(peacock) == 'ï..height'] <- 'height'
# Exploring the data
# The "las=2" in boxplot flips the x axis labels so they don't overlap
summary(peacock)
## height diameter hopper_slope hopper_orifice
## Min. : 39.00 Min. : 35.00 Min. :19.00 Min. : 6.00
## 1st Qu.: 64.00 1st Qu.: 66.50 1st Qu.:37.25 1st Qu.:15.50
## Median : 70.00 Median : 75.00 Median :41.00 Median :20.00
## Mean : 70.79 Mean : 72.93 Mean :40.90 Mean :19.74
## 3rd Qu.: 81.25 3rd Qu.: 80.00 3rd Qu.:46.00 3rd Qu.:25.00
## Max. :100.00 Max. :102.00 Max. :57.00 Max. :30.00
## thickness mount_protrusion mount_width mount_height
## Min. : 0.000 Min. :0.00 Min. :15.00 Min. :11.50
## 1st Qu.: 2.500 1st Qu.:0.50 1st Qu.:25.25 1st Qu.:20.50
## Median : 3.250 Median :0.50 Median :30.00 Median :24.00
## Mean : 4.631 Mean :1.11 Mean :29.17 Mean :22.82
## 3rd Qu.: 6.000 3rd Qu.:1.00 3rd Qu.:34.00 3rd Qu.:25.00
## Max. :13.000 Max. :5.00 Max. :50.00 Max. :30.00
## m_aper_height m_aper_width mount_mortice girth_band
## Min. : 5.00 Min. : 6.00 Min. : 3.000 Min. :1.000
## 1st Qu.:10.00 1st Qu.:11.12 1st Qu.: 5.000 1st Qu.:1.000
## Median :11.00 Median :12.00 Median : 5.500 Median :1.000
## Mean :10.77 Mean :11.89 Mean : 5.714 Mean :1.429
## 3rd Qu.:12.00 3rd Qu.:12.50 3rd Qu.: 6.000 3rd Qu.:2.000
## Max. :16.50 Max. :17.00 Max. :12.000 Max. :2.000
## type
## Length:42
## Class :character
## Mode :character
##
##
##
boxplot(peacock[-c(12:13)], las=2)
# Scaling the data
peacock.std<-scale(peacock[1:11])
peacock.std
## height diameter hopper_slope hopper_orifice thickness
## [1,] -0.71959107 -1.278452126 -0.8250442 0.39711929 -0.80081772
## [2,] -0.31928875 0.156435070 -0.5860659 1.27496194 -0.19205131
## [3,] -0.05242054 -0.221166823 -0.4665767 -0.48072335 0.87328991
## [4,] -0.31928875 -0.749809475 0.7283149 0.92382488 -0.49643451
## [5,] -0.25257170 -0.372207581 -0.3470876 0.04598223 -0.80081772
## [6,] 0.34788178 0.382996206 0.8478041 0.57268782 -0.64862612
## [7,] 0.28116472 0.005394313 0.4893366 0.57268782 -0.49643451
## [8,] -2.05393213 -2.260217049 -1.9004467 -1.88527159 0.11233190
## [9,] 0.14773062 0.231955449 0.6088257 1.80166753 -0.49643451
## [10,] 1.74893988 2.195485296 1.0867824 0.92382488 0.41671510
## [11,] 1.54878873 1.062679615 1.3257607 -1.00742894 0.72109831
## [12,] 0.14773062 0.685077721 0.6088257 -0.12958630 0.41671510
## [13,] -0.45272286 -0.749809475 -0.9445334 0.04598223 -0.34424291
## [14,] 0.08101356 -0.598768717 -0.3470876 -0.83186041 -0.49643451
## [15,] -1.98721507 -2.864380079 -2.4978925 -2.41197718 0.11233190
## [16,] 0.21444767 0.156435070 -0.1081092 0.57268782 0.11233190
## [17,] -0.18585465 1.062679615 0.4893366 -1.00742894 1.63424792
## [18,] 0.88161820 0.231955449 0.2503582 -0.48072335 2.24301434
## [19,] 0.01429651 0.156435070 0.4893366 -0.48072335 -0.19205131
## [20,] -0.85302518 -0.070126066 -0.1081092 0.04598223 -0.80081772
## [21,] -0.05242054 0.005394313 -0.1081092 1.09939341 -0.80081772
## [22,] 0.81490114 0.760598100 1.4452499 -0.83186041 -0.03985971
## [23,] 1.21520346 1.138199994 1.9232065 0.92382488 -0.80081772
## [24,] 0.88161820 0.911638858 0.7283149 0.92382488 0.11233190
## [25,] 0.21444767 0.005394313 -0.1081092 -1.35856600 1.93863113
## [26,] -0.38600581 1.138199994 -0.1081092 1.27496194 1.17767312
## [27,] -0.45272286 -0.221166823 0.1308691 0.22155076 -0.64862612
## [28,] -2.12064918 -2.637818943 -2.6173816 -2.23640865 -0.03985971
## [29,] -0.45272286 0.382996206 0.3698474 -0.83186041 -0.64862612
## [30,] 1.14848641 0.987159236 0.7283149 -0.48072335 1.02548151
## [31,] 0.21444767 0.534036964 0.3698474 0.04598223 -0.64862612
## [32,] -0.05242054 -0.523248338 -0.2275984 -0.30515483 -1.40958413
## [33,] -0.71959107 -0.598768717 -1.3030009 0.92382488 -1.25739253
## [34,] -0.31928875 0.231955449 0.3698474 0.92382488 -0.64862612
## [35,] -0.58615696 -0.598768717 -0.4665767 -0.83186041 -0.49643451
## [36,] -0.45272286 0.005394313 -0.2275984 1.27496194 -0.64862612
## [37,] 1.68222283 0.156435070 0.4893366 -1.18299747 2.24301434
## [38,] -1.85378097 -0.900850232 -1.7809575 0.92382488 -0.49643451
## [39,] 1.94909104 1.289240751 1.3257607 0.22155076 2.54739754
## [40,] 1.41535462 0.534036964 -0.1081092 -0.30515483 0.72109831
## [41,] -1.11989339 -0.221166823 -0.7055551 0.92382488 -0.64862612
## [42,] 0.81490114 0.458516585 1.0867824 0.22155076 -1.40958413
## mount_protrusion mount_width mount_height m_aper_height m_aper_width
## [1,] -0.89881701 -2.04383044 -1.64571371 0.6244279 0.04973618
## [2,] 0.31632187 0.12022532 0.04308151 0.6244279 0.04973618
## [3,] -0.49377072 0.12022532 0.28433797 0.6244279 0.04973618
## [4,] -0.49377072 0.69730686 0.76685089 0.3698068 0.28183836
## [5,] -0.49377072 0.26449570 0.28433797 0.1151857 0.51394054
## [6,] -0.49377072 -0.02404506 0.76685089 0.1151857 0.04973618
## [7,] -0.49377072 0.40876609 0.52559443 0.1151857 0.04973618
## [8,] -0.08872443 -1.75528967 -2.36948309 -2.1764040 -2.50338780
## [9,] -0.49377072 -0.45685622 -0.07754672 0.1151857 -0.18236600
## [10,] -0.65578924 1.70719954 1.73187673 1.6429122 1.44234926
## [11,] -0.49377072 0.98584762 1.49062027 0.6244279 0.97814490
## [12,] -0.65578924 0.69730686 1.00810735 0.1151857 0.04973618
## [13,] 1.93650704 -0.74539698 -0.19817495 -0.9032986 -1.34287690
## [14,] -0.49377072 -0.60112660 0.52559443 -0.3940564 0.04973618
## [15,] -0.49377072 -1.89956005 -2.61073955 -2.1764040 -2.73548998
## [16,] -0.08872443 0.69730686 0.28433797 0.1151857 0.04973618
## [17,] -0.49377072 0.12022532 0.28433797 0.1151857 0.04973618
## [18,] -0.89881701 -0.31258583 -0.19817495 -0.3940564 0.04973618
## [19,] -0.49377072 -0.31258583 0.52559443 -0.3940564 0.04973618
## [20,] 0.72136816 -0.02404506 0.28433797 -0.6486775 -0.41446818
## [21,] -0.49377072 0.26449570 -0.19817495 -0.3940564 0.04973618
## [22,] -0.49377072 3.00563300 0.52559443 -0.3940564 0.04973618
## [23,] -0.49377072 0.55303647 -0.68068787 0.8790490 0.74604272
## [24,] -0.49377072 0.40876609 1.24936381 2.9160177 0.74604272
## [25,] -0.89881701 -0.88966737 -1.04257256 -0.6486775 -0.87867254
## [26,] -0.49377072 0.69730686 0.76685089 0.8790490 0.97814490
## [27,] -0.08872443 -0.31258583 -0.19817495 -0.3940564 -0.41446818
## [28,] -0.49377072 -2.04383044 -2.73136778 -2.9402673 -2.50338780
## [29,] -0.08872443 -0.02404506 0.52559443 0.3698068 -0.41446818
## [30,] -0.49377072 0.84157724 -0.19817495 0.1151857 0.04973618
## [31,] -0.49377072 -0.02404506 1.00810735 -0.3940564 0.04973618
## [32,] 1.53146075 -1.32247852 -0.68068787 -1.4125408 -0.87867254
## [33,] 2.74659963 -0.60112660 -0.68068787 -0.3940564 -0.41446818
## [34,] -0.08872443 0.55303647 0.52559443 0.8790490 0.97814490
## [35,] 2.74659963 -1.03393775 -0.68068787 0.1151857 0.04973618
## [36,] 0.72136816 0.26449570 0.28433797 0.8790490 0.04973618
## [37,] -0.49377072 0.12022532 0.52559443 0.1151857 0.28183836
## [38,] -0.08872443 -1.17820813 -0.19817495 0.6244279 0.04973618
## [39,] 0.72136816 0.84157724 1.00810735 0.3698068 2.37075798
## [40,] -0.08872443 0.69730686 0.52559443 0.6244279 0.97814490
## [41,] 3.15164592 0.69730686 -0.68068787 -0.6486775 -0.41446818
## [42,] -0.08872443 0.84157724 -0.68068787 0.6244279 1.90655362
## mount_mortice
## [1,] -1.4581228
## [2,] -0.1151150
## [3,] -0.9209197
## [4,] -0.3837165
## [5,] -0.3837165
## [6,] 0.1534866
## [7,] -0.1151150
## [8,] -1.4581228
## [9,] -0.3837165
## [10,] 0.1534866
## [11,] 0.4220882
## [12,] -0.1151150
## [13,] -0.3837165
## [14,] -0.3837165
## [15,] -1.4581228
## [16,] -0.1151150
## [17,] 3.3767055
## [18,] 3.3767055
## [19,] -0.3837165
## [20,] -0.1151150
## [21,] -0.1151150
## [22,] 0.1534866
## [23,] 0.4220882
## [24,] 0.1534866
## [25,] -0.6523181
## [26,] 0.1534866
## [27,] -0.9209197
## [28,] -1.4581228
## [29,] -0.6523181
## [30,] 0.4220882
## [31,] -0.1151150
## [32,] -0.3837165
## [33,] -0.3837165
## [34,] 0.1534866
## [35,] 0.1534866
## [36,] -0.1151150
## [37,] -0.1151150
## [38,] 0.6906898
## [39,] 1.2278929
## [40,] 1.2278929
## [41,] -0.3837165
## [42,] 1.2278929
## attr(,"scaled:center")
## height diameter hopper_slope hopper_orifice
## 70.785714 72.928571 40.904762 19.738095
## thickness mount_protrusion mount_width mount_height
## 4.630952 1.109524 29.166667 22.821429
## m_aper_height m_aper_width mount_mortice
## 10.773810 11.892857 5.714286
## attr(,"scaled:scale")
## height diameter hopper_slope hopper_orifice
## 14.988672 13.241459 8.368960 5.695782
## thickness mount_protrusion mount_width mount_height
## 3.285332 1.234427 6.931430 4.144967
## m_aper_height m_aper_width mount_mortice
## 1.963702 2.154224 1.861493
boxplot(peacock.std, las=2)
# Summary statistics with agnes() command
agnes(peacock.std)
## Call: agnes(x = peacock.std)
## Agglomerative coefficient: 0.7512433
## Order of objects:
## [1] 1 38 13 32 35 33 41 2 36 4 34 9 5 16 21 6 7 19 31 12 14 20 27 29 3
## [26] 26 23 42 11 30 40 37 22 10 24 25 17 18 39 8 15 28
## Height (summary):
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.7851 1.5464 2.0448 2.4766 3.2775 7.7150
##
## Available components:
## [1] "order" "height" "ac" "merge" "diss" "call" "method" "data"
# Cluster analysis (HCA): Single linkage
peacock.single <- hclust(dist(peacock.std, method="euclid"), method="single")
plot(peacock.single, main="Single Linkage")
# Outlining 5 clusters in the dendrogram
rect.hclust(peacock.single, k=5)
# Adding the assigned group as a column to the original data frame
peacock.single.5<-cutree(peacock.single,k=5)
peacock$single.5 <- peacock.single.5
peacock
## height diameter hopper_slope hopper_orifice thickness mount_protrusion
## 1 60 56 34 22 2.0 0.0
## 2 66 75 36 27 4.0 1.5
## 3 70 70 37 17 7.5 0.5
## 4 66 63 47 25 3.0 0.5
## 5 67 68 38 20 2.0 0.5
## 6 76 78 48 23 2.5 0.5
## 7 75 73 45 23 3.0 0.5
## 8 40 43 25 9 5.0 1.0
## 9 73 76 46 30 3.0 0.5
## 10 97 102 50 25 6.0 0.3
## 11 94 87 52 14 7.0 0.5
## 12 73 82 46 19 6.0 0.3
## 13 64 63 33 20 3.5 3.5
## 14 72 65 38 15 3.0 0.5
## 15 41 35 20 6 5.0 0.5
## 16 74 75 40 23 5.0 1.0
## 17 68 87 45 14 10.0 0.5
## 18 84 76 43 17 12.0 0.0
## 19 71 75 45 17 4.0 0.5
## 20 58 72 40 20 2.0 2.0
## 21 70 73 40 26 2.0 0.5
## 22 83 83 53 15 4.5 0.5
## 23 89 88 57 25 2.0 0.5
## 24 84 85 47 25 5.0 0.5
## 25 74 73 40 12 11.0 0.0
## 26 65 88 40 27 8.5 0.5
## 27 64 70 42 21 2.5 1.0
## 28 39 38 19 7 4.5 0.5
## 29 64 78 44 15 2.5 1.0
## 30 88 86 47 17 8.0 0.5
## 31 74 80 44 20 2.5 0.5
## 32 70 66 39 18 0.0 3.0
## 33 60 65 30 25 0.5 4.5
## 34 66 76 44 25 2.5 1.0
## 35 62 65 37 15 3.0 4.5
## 36 64 73 39 27 2.5 2.0
## 37 96 75 45 13 12.0 0.5
## 38 43 61 26 25 3.0 1.0
## 39 100 90 52 21 13.0 2.0
## 40 92 80 40 18 7.0 1.0
## 41 54 70 35 25 2.5 5.0
## 42 83 79 50 21 0.0 1.0
## mount_width mount_height m_aper_height m_aper_width mount_mortice girth_band
## 1 15 16.0 12.0 12.0 3.0 2
## 2 30 23.0 12.0 12.0 5.5 1
## 3 30 24.0 12.0 12.0 4.0 2
## 4 34 26.0 11.5 12.5 5.0 1
## 5 31 24.0 11.0 13.0 5.0 1
## 6 29 26.0 11.0 12.0 6.0 1
## 7 32 25.0 11.0 12.0 5.5 1
## 8 17 13.0 6.5 6.5 3.0 1
## 9 26 22.5 11.0 11.5 5.0 2
## 10 41 30.0 14.0 15.0 6.0 1
## 11 36 29.0 12.0 14.0 6.5 1
## 12 34 27.0 11.0 12.0 5.5 1
## 13 24 22.0 9.0 9.0 5.0 1
## 14 25 25.0 10.0 12.0 5.0 2
## 15 16 12.0 6.5 6.0 3.0 2
## 16 34 24.0 11.0 12.0 5.5 2
## 17 30 24.0 11.0 12.0 12.0 2
## 18 27 22.0 10.0 12.0 12.0 2
## 19 27 25.0 10.0 12.0 5.0 2
## 20 29 24.0 9.5 11.0 5.5 1
## 21 31 22.0 10.0 12.0 5.5 1
## 22 50 25.0 10.0 12.0 6.0 1
## 23 33 20.0 12.5 13.5 6.5 1
## 24 32 28.0 16.5 13.5 6.0 1
## 25 23 18.5 9.5 10.0 4.5 2
## 26 34 26.0 12.5 14.0 6.0 1
## 27 27 22.0 10.0 11.0 4.0 1
## 28 15 11.5 5.0 6.5 3.0 1
## 29 29 25.0 11.5 11.0 4.5 2
## 30 35 22.0 11.0 12.0 6.5 2
## 31 29 27.0 10.0 12.0 5.5 1
## 32 20 20.0 8.0 10.0 5.0 2
## 33 25 20.0 10.0 11.0 5.0 2
## 34 33 25.0 12.5 14.0 6.0 1
## 35 22 20.0 11.0 12.0 6.0 2
## 36 31 24.0 12.5 12.0 5.5 2
## 37 30 25.0 11.0 12.5 5.5 1
## 38 21 22.0 12.0 12.0 7.0 2
## 39 35 27.0 11.5 17.0 8.0 1
## 40 34 25.0 12.0 14.0 8.0 1
## 41 34 20.0 9.5 11.0 5.0 2
## 42 35 20.0 12.0 16.0 8.0 1
## type single.5
## 1 B 1
## 2 C 2
## 3 C 2
## 4 C 2
## 5 C 2
## 6 C 2
## 7 C 2
## 8 A 3
## 9 C 2
## 10 D 2
## 11 D 2
## 12 C 2
## 13 B 2
## 14 C 2
## 15 A 3
## 16 C 2
## 17 D 4
## 18 D 4
## 19 C 2
## 20 C 2
## 21 C 2
## 22 D 2
## 23 D 2
## 24 D 2
## 25 C 2
## 26 D 2
## 27 C 2
## 28 A 3
## 29 C 2
## 30 D 2
## 31 C 2
## 32 B 2
## 33 B 2
## 34 C 2
## 35 B 2
## 36 C 2
## 37 D 2
## 38 B 2
## 39 E 5
## 40 E 2
## 41 C 2
## 42 E 2
# Cross-tabulating the new groups vector (groups 1-5) with the original "type" column (types A-E)
# In our experimental groups, a few outliers dominate the "typing" (group 2 has 35 mills!)
addmargins(table(peacock.single.5,peacock[,13]))
##
## peacock.single.5 A B C D E Sum
## 1 0 1 0 0 0 1
## 2 0 5 20 8 2 35
## 3 3 0 0 0 0 3
## 4 0 0 0 2 0 2
## 5 0 0 0 0 1 1
## Sum 3 6 20 10 3 42
# Average method (HCA)
peacock.ave<-
hclust(dist(peacock.std,method="euclid"),method="ave")
# Wards method (HCA)
peacock.ward<-
hclust(dist(peacock.std,method="euclid"),method="ward")
# Plotting all three methods for comparison
plot(peacock.single,main="Single Linkage")
rect.hclust(peacock.single,k=5)
plot(peacock.ave,main="Average Linkage")
rect.hclust(peacock.ave,k=5)
plot(peacock.ward,main="Ward")
rect.hclust(peacock.ward,k=5)
# Examinimg the new groups with cross tabulation
peacock.ave.5<-cutree(peacock.ave,k=5)
peacock.ward.5<-cutree(peacock.ward,k=5)
addmargins(table(peacock.single.5,peacock[,13]))
##
## peacock.single.5 A B C D E Sum
## 1 0 1 0 0 0 1
## 2 0 5 20 8 2 35
## 3 3 0 0 0 0 3
## 4 0 0 0 2 0 2
## 5 0 0 0 0 1 1
## Sum 3 6 20 10 3 42
addmargins(table(peacock.ave.5,peacock[,13]))
##
## peacock.ave.5 A B C D E Sum
## 1 0 6 1 0 0 7
## 2 0 0 19 8 2 29
## 3 3 0 0 0 0 3
## 4 0 0 0 2 0 2
## 5 0 0 0 0 1 1
## Sum 3 6 20 10 3 42
addmargins(table(peacock.ward.5,peacock[,13]))
##
## peacock.ward.5 A B C D E Sum
## 1 0 2 19 0 0 21
## 2 3 0 0 0 0 3
## 3 0 0 0 8 3 11
## 4 0 4 1 0 0 5
## 5 0 0 0 2 0 2
## Sum 3 6 20 10 3 42
# This analysis for determining optimal amount of clusters is sensitive to the set seed
# But rerunning the analysis with a few different seeds shows that more often than not, 5 seems to be a good fit for this data
library(tidyverse)
set.seed(12)
wss <- function(k) {
kmeans(peacock.std, k)$tot.withinss
}
k.values <- 1:10
wss_values <- map_dbl(k.values, wss)
plot(k.values, wss_values,
type="b", pch = 19, frame = FALSE,
xlab="Number of clusters K",
ylab="Total within-clusters sum of squares")
# K-means partitioning
peacock.k5<-kmeans(peacock.std,centers=5)
peacock.k5
## K-means clustering with 5 clusters of sizes 9, 8, 5, 2, 18
##
## Cluster means:
## height diameter hopper_slope hopper_orifice thickness mount_protrusion
## 1 1.1114214 1.0542885 0.967293217 0.2605660 0.3828947 -0.2867471
## 2 -1.1365726 -1.3067723 -1.332873153 -0.7221301 -0.4964345 1.3795684
## 3 0.6280934 0.3225799 0.250358244 -0.7265193 1.2994264 -0.6071837
## 4 0.3478818 0.6473175 0.369847406 -0.7440761 1.9386311 -0.6962939
## 5 -0.2636912 -0.1078863 -0.001896653 0.4751498 -0.5471650 -0.2237399
## mount_width mount_height m_aper_height m_aper_width mount_mortice
## 1 1.08202788 0.65962580 0.90734021 1.13287969 0.5713113
## 2 -1.08803914 -1.32906461 -1.31705787 -1.34287690 -0.7194685
## 3 0.17793347 0.11545845 0.06426151 -0.08952513 -0.2762759
## 4 -0.09618026 0.04308151 -0.13943536 0.04973618 3.3767055
## 5 -0.09618026 0.22402386 0.12933134 0.04973618 -0.2643381
##
## Clustering vector:
## [1] 5 5 3 5 5 5 5 2 5 1 1 3 2 5 2 5 4 4 5 5 5 1 1 1 3 1 5 2 5 3 5 2 2 5 2 5 3 5
## [39] 1 1 2 1
##
## Within cluster sum of squares by cluster:
## [1] 50.908437 82.475658 14.881095 1.689131 52.154370
## (between_SS / total_SS = 55.2 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
addmargins(table(peacock.k5$cluster, peacock[,14]))
##
## 1 2 3 4 5 Sum
## 1 0 8 0 0 1 9
## 2 0 5 3 0 0 8
## 3 0 5 0 0 0 5
## 4 0 0 0 2 0 2
## 5 1 17 0 0 0 18
## Sum 1 35 3 2 1 42
# Defining my own color vector for the plot, then plotting the pairs
mycols<-c("black","red","blue","green","cyan")
plot(peacock[1:11],col=mycols[peacock.k5$cluster],pch=19, cex=0.5)
# Trying again with 3 klusters
peacock.k3<-kmeans(peacock.std,centers=3)
plot(peacock[1:11],col=mycols[peacock.k3$cluster],pch=19, cex=0.3)
# Plotting the height and diameter pair in better resolution
plot(peacock[,1]~peacock[,2],pch=16,col=mycols[peacock.k3$cluster])
# Carrying out PCA
peacockPCA<-prcomp(peacock[,1:11],center=T,scale=T)
plot(peacockPCA)
peacockPCA$sdev^2
## [1] 6.04208292 1.76178617 0.91801496 0.66495332 0.45364417 0.35802200
## [7] 0.26308415 0.21807116 0.15130706 0.09460398 0.07443011
summary(peacockPCA)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.4581 1.3273 0.95813 0.81545 0.67353 0.59835 0.51292
## Proportion of Variance 0.5493 0.1602 0.08346 0.06045 0.04124 0.03255 0.02392
## Cumulative Proportion 0.5493 0.7094 0.79290 0.85335 0.89459 0.92714 0.95105
## PC8 PC9 PC10 PC11
## Standard deviation 0.46698 0.38898 0.3076 0.27282
## Proportion of Variance 0.01982 0.01376 0.0086 0.00677
## Cumulative Proportion 0.97088 0.98463 0.9932 1.00000
# Plotting the PCA
biplot(peacockPCA,main="PCA biplot of Roman mills")
# Plotting with Peacock’s groups instead of numbers
biplot(peacockPCA,xlabs=peacock[,13], main="PCA biplot of mill
data with Peacock groups")
# Plotting with K-means groups instead of numbers
biplot(peacockPCA,xlabs=peacock.k5$cluster)
# Using FactoMineR for PCA
library("FactoMineR")
peacockPCA_FTMR<-PCA(peacock[1:11])
# Diagnostics with FactoMineR
round(peacockPCA_FTMR$eig,2)
## eigenvalue percentage of variance cumulative percentage of variance
## comp 1 6.04 54.93 54.93
## comp 2 1.76 16.02 70.94
## comp 3 0.92 8.35 79.29
## comp 4 0.66 6.05 85.33
## comp 5 0.45 4.12 89.46
## comp 6 0.36 3.25 92.71
## comp 7 0.26 2.39 95.11
## comp 8 0.22 1.98 97.09
## comp 9 0.15 1.38 98.46
## comp 10 0.09 0.86 99.32
## comp 11 0.07 0.68 100.00
dimdesc(peacockPCA_FTMR)
## $Dim.1
## $quanti
## correlation p.value
## diameter 0.9509599 5.431659e-22
## m_aper_width 0.9035309 2.623704e-16
## hopper_slope 0.8906081 2.868054e-15
## mount_height 0.8594395 3.201079e-13
## height 0.8586321 3.561697e-13
## mount_width 0.8270635 1.475047e-11
## m_aper_height 0.8041048 1.422684e-10
## mount_mortice 0.5873993 4.322354e-05
## hopper_orifice 0.4900349 9.872966e-04
##
## attr(,"class")
## [1] "condes" "list"
##
## $Dim.2
## $quanti
## correlation p.value
## hopper_orifice 0.7361042 2.773713e-08
## mount_protrusion 0.5285502 3.205013e-04
## mount_mortice -0.3141322 4.276708e-02
## thickness -0.8121858 6.637085e-11
##
## attr(,"class")
## [1] "condes" "list"
##
## $Dim.3
## $quanti
## correlation p.value
## mount_protrusion 0.7431472 1.735630e-08
## mount_mortice 0.5262374 3.442161e-04
##
## attr(,"class")
## [1] "condes" "list"
##
## $call
## $call$num.var
## [1] 1
##
## $call$proba
## [1] 0.05
##
## $call$weights
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [39] 1 1 1 1
##
## $call$X
## Dim.1 height diameter hopper_slope hopper_orifice thickness
## 1 -2.392608517 60 56 34 22 2.0
## 2 0.191382920 66 75 36 27 4.0
## 3 -0.078525088 70 70 37 17 7.5
## 4 0.678491086 66 63 47 25 3.0
## 5 -0.073759424 67 68 38 20 2.0
## 6 1.026468957 76 78 48 23 2.5
## 7 0.738676447 75 73 45 23 3.0
## 8 -6.107597146 40 43 25 9 5.0
## 9 0.412469316 73 76 46 30 3.0
## 10 4.486798042 97 102 50 25 6.0
## 11 2.915117831 94 87 52 14 7.0
## 12 1.252522191 73 82 46 19 6.0
## 13 -2.231543247 64 63 33 20 3.5
## 14 -0.736273084 72 65 38 15 3.0
## 15 -6.828208755 41 35 20 6 5.0
## 16 0.601644379 74 75 40 23 5.0
## 17 1.582376072 68 87 45 14 10.0
## 18 1.277186435 84 76 43 17 12.0
## 19 0.047794055 71 75 45 17 4.0
## 20 -0.827509978 58 72 40 20 2.0
## 21 -0.002046682 70 73 40 26 2.0
## 22 2.125103519 83 83 53 15 4.5
## 23 2.337175373 89 88 57 25 2.0
## 24 3.043995150 84 85 47 25 5.0
## 25 -1.291609041 74 73 40 12 11.0
## 26 1.914466246 65 88 40 27 8.5
## 27 -0.906433877 64 70 42 21 2.5
## 28 -7.071496267 39 38 19 7 4.5
## 29 -0.122516269 64 78 44 15 2.5
## 30 1.506544425 88 86 47 17 8.0
## 31 0.608690174 74 80 44 20 2.5
## 32 -2.260204496 70 66 39 18 0.0
## 33 -2.014192118 60 65 30 25 0.5
## 34 1.297895774 66 76 44 25 2.5
## 35 -1.600435011 62 65 37 15 3.0
## 36 0.342581203 64 73 39 27 2.5
## 37 1.251937992 96 75 45 13 12.0
## 38 -1.604329350 43 61 26 25 3.0
## 39 3.902996798 100 90 52 21 13.0
## 40 1.993368175 92 80 40 18 7.0
## 41 -1.397278369 54 70 35 25 2.5
## 42 2.010884159 83 79 50 21 0.0
## mount_protrusion mount_width mount_height m_aper_height m_aper_width
## 1 0.0 15 16.0 12.0 12.0
## 2 1.5 30 23.0 12.0 12.0
## 3 0.5 30 24.0 12.0 12.0
## 4 0.5 34 26.0 11.5 12.5
## 5 0.5 31 24.0 11.0 13.0
## 6 0.5 29 26.0 11.0 12.0
## 7 0.5 32 25.0 11.0 12.0
## 8 1.0 17 13.0 6.5 6.5
## 9 0.5 26 22.5 11.0 11.5
## 10 0.3 41 30.0 14.0 15.0
## 11 0.5 36 29.0 12.0 14.0
## 12 0.3 34 27.0 11.0 12.0
## 13 3.5 24 22.0 9.0 9.0
## 14 0.5 25 25.0 10.0 12.0
## 15 0.5 16 12.0 6.5 6.0
## 16 1.0 34 24.0 11.0 12.0
## 17 0.5 30 24.0 11.0 12.0
## 18 0.0 27 22.0 10.0 12.0
## 19 0.5 27 25.0 10.0 12.0
## 20 2.0 29 24.0 9.5 11.0
## 21 0.5 31 22.0 10.0 12.0
## 22 0.5 50 25.0 10.0 12.0
## 23 0.5 33 20.0 12.5 13.5
## 24 0.5 32 28.0 16.5 13.5
## 25 0.0 23 18.5 9.5 10.0
## 26 0.5 34 26.0 12.5 14.0
## 27 1.0 27 22.0 10.0 11.0
## 28 0.5 15 11.5 5.0 6.5
## 29 1.0 29 25.0 11.5 11.0
## 30 0.5 35 22.0 11.0 12.0
## 31 0.5 29 27.0 10.0 12.0
## 32 3.0 20 20.0 8.0 10.0
## 33 4.5 25 20.0 10.0 11.0
## 34 1.0 33 25.0 12.5 14.0
## 35 4.5 22 20.0 11.0 12.0
## 36 2.0 31 24.0 12.5 12.0
## 37 0.5 30 25.0 11.0 12.5
## 38 1.0 21 22.0 12.0 12.0
## 39 2.0 35 27.0 11.5 17.0
## 40 1.0 34 25.0 12.0 14.0
## 41 5.0 34 20.0 9.5 11.0
## 42 1.0 35 20.0 12.0 16.0
## mount_mortice
## 1 3.0
## 2 5.5
## 3 4.0
## 4 5.0
## 5 5.0
## 6 6.0
## 7 5.5
## 8 3.0
## 9 5.0
## 10 6.0
## 11 6.5
## 12 5.5
## 13 5.0
## 14 5.0
## 15 3.0
## 16 5.5
## 17 12.0
## 18 12.0
## 19 5.0
## 20 5.5
## 21 5.5
## 22 6.0
## 23 6.5
## 24 6.0
## 25 4.5
## 26 6.0
## 27 4.0
## 28 3.0
## 29 4.5
## 30 6.5
## 31 5.5
## 32 5.0
## 33 5.0
## 34 6.0
## 35 6.0
## 36 5.5
## 37 5.5
## 38 7.0
## 39 8.0
## 40 8.0
## 41 5.0
## 42 8.0
round(peacockPCA_FTMR$var$cor,2)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## height 0.86 -0.26 0.00 -0.20 0.03
## diameter 0.95 -0.02 0.07 -0.08 0.01
## hopper_slope 0.89 -0.05 -0.11 -0.26 -0.20
## hopper_orifice 0.49 0.74 -0.03 0.30 0.01
## thickness 0.29 -0.81 0.20 0.17 0.38
## mount_protrusion -0.23 0.53 0.74 -0.25 0.20
## mount_width 0.83 0.04 -0.01 -0.35 -0.03
## mount_height 0.86 0.10 -0.08 -0.09 0.19
## m_aper_height 0.80 0.28 -0.15 0.33 0.20
## m_aper_width 0.90 0.14 0.05 0.13 0.01
## mount_mortice 0.59 -0.31 0.53 0.33 -0.39
round(peacockPCA_FTMR$var$cos,2)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## height 0.74 0.07 0.00 0.04 0.00
## diameter 0.90 0.00 0.00 0.01 0.00
## hopper_slope 0.79 0.00 0.01 0.07 0.04
## hopper_orifice 0.24 0.54 0.00 0.09 0.00
## thickness 0.08 0.66 0.04 0.03 0.14
## mount_protrusion 0.05 0.28 0.55 0.06 0.04
## mount_width 0.68 0.00 0.00 0.12 0.00
## mount_height 0.74 0.01 0.01 0.01 0.04
## m_aper_height 0.65 0.08 0.02 0.11 0.04
## m_aper_width 0.82 0.02 0.00 0.02 0.00
## mount_mortice 0.35 0.10 0.28 0.11 0.15
# Colour plotting
# Had to tweak mycols to "as.factor" instead of "as.numeric" in cols2, as.numeric returned only NA:s
# Plotting with these color schemes doesn't work in this version of R
cols2 <- mycols[as.factor(peacock$type)]
cols2
## [1] "red" "blue" "blue" "blue" "blue" "blue" "blue" "black" "blue"
## [10] "green" "green" "blue" "red" "blue" "black" "blue" "green" "green"
## [19] "blue" "blue" "blue" "green" "green" "green" "blue" "green" "blue"
## [28] "black" "blue" "green" "blue" "red" "red" "blue" "red" "blue"
## [37] "green" "red" "cyan" "cyan" "blue" "cyan"
## plot(peacockPCA_FTMR, choix ='ind', col.ind=cols2, title="PCA and Peacock's Mills")
cols3<-mycols[as.numeric(peacock.k5$cluster)]
cols3
## [1] "cyan" "cyan" "blue" "cyan" "cyan" "cyan" "cyan" "red" "cyan"
## [10] "black" "black" "blue" "red" "cyan" "red" "cyan" "green" "green"
## [19] "cyan" "cyan" "cyan" "black" "black" "black" "blue" "black" "cyan"
## [28] "red" "cyan" "blue" "cyan" "red" "red" "cyan" "red" "cyan"
## [37] "blue" "cyan" "black" "black" "red" "black"
## plot(peacockPCA_FTMR, choix ='ind', col.ind=cols3, title="PCA and Peacock's Mills")
# Setting up
setwd("~/R/SADE/week6")
library(devtools)
# Downloading the archSeries package
devtools::install_github("davidcorton/archSeries")
library(archSeries)
# Data
mydata <- read.csv(file="mydata.csv", header=TRUE, sep=",")
head(mydata)
## objecttype broadperio fromdate todate old_findID
## 1 END SCRAPER NEOLITHIC -3500 -800 WILT-7FC4B2
## 2 END SCRAPER NEOLITHIC -2500 -2100 WILT-7FA0D7
## 3 END SCRAPER NEOLITHIC -2500 -2100 WILT-7F7A04
## 4 AWL BRONZE AGE -2150 -800 WILT-77CB78
## 5 ARMLET UNKNOWN -1200 410 WILT-BB14B5
## 6 SOCKETED AXEHEAD BRONZE AGE -1150 -800 WILT-CB58F4
nrow(mydata)
## [1] 1859
# Renaming columns to comply with archSeries
names(mydata)[names(mydata)=="fromdate"] <- "Start"
names(mydata)[names(mydata)=="todate"] <- "End"
# Coercing NA:s to zeros, then removing all the rows in which Start or End date is zero
mydata[is.na(mydata)] <- 0
mydata<-subset(mydata, Start!="0")
mydata<-subset(mydata, End!="0")
# Inspecting for typos in Start and End dates
mydata$subtract <- (mydata$End - mydata$Start)
mydata <- mydata[order(mydata$subtract),]
head(mydata)
## objecttype broadperio Start End old_findID subtract
## 1859 COIN MEDIEVAL 12222 1236 WILT-9105D6 -10986
## 1348 BROOCH IRON AGE 400 200 WILT-89BBAC -200
## 1655 STRAP END MEDIEVAL 1300 1250 WILT-1A4B16 -50
## 160 COIN IRON AGE 58 43 WILT-6F6F27 -15
## 161 COIN IRON AGE 58 43 WILT-6F2D79 -15
## 1003 COIN ROMAN 347 340 BM-B6331D -7
# Deleting the typoed rows with earlier End date than Start date
mydata <- mydata[!mydata$subtract < 0, ]
# Simplifying the data by selecting only the vectors we need for the analysis
mydata <- mydata[,c("objecttype", "Start","End")]
# Aoristic weighing + plot
aorist<-aorist(mydata, start.date=1000, end.date=1600, bin.width=20)
aorist.plot(aorist, opacity=80, ylab="Aoristic Sum")
summary(mydata)
## objecttype Start End
## Length:1844 Min. :-3500.0 Min. :-2100.0
## Class :character 1st Qu.: 275.0 1st Qu.: 326.0
## Mode :character Median : 337.0 Median : 361.0
## Mean : 533.4 Mean : 626.4
## 3rd Qu.: 911.5 3rd Qu.: 1200.0
## Max. : 1800.0 Max. : 1900.0
# Dividing data into "coin" and artifacts
coin <- mydata[mydata$objecttype == "COIN", ]
notcoin <- mydata[mydata$objecttype != "COIN", ]
# Aoristic weighing + plot for coins
aorist_coin<-aorist(coin, start.date=1000, end.date=1600, bin.width=20)
aorist.plot(aorist_coin, opacity=80, ylab="Aoristic Sum for Coin")
# Aoristic weighing + plot for artifacts
aorist_notcoin<-aorist(notcoin, start.date=1000, end.date=1600, bin.width=20)
aorist.plot(aorist_notcoin, opacity=80, ylab="Aoristic Sum for Coin")
# Creating unique ID row for the analysis
mydata$ID <- c(1:nrow(mydata))
# Simulating distributions
mydata_sim<-date.simulate(mydata, start.date=1000, end.date=1600, bin.width=20, reps=1000)
# Plotting the results
lines.chron(mydata_sim)
poly.chron(mydata_sim)
box.chron(mydata_sim)
# Same for coin and artifacts
coin$ID <- c(1:nrow(coin))
notcoin$ID <- c(1:nrow(notcoin))
# coin
coin_sim<-date.simulate(coin, start.date=1000, end.date=1600, bin.width=20, reps=1000)
lines.chron(coin_sim)
poly.chron(coin_sim)
box.chron(coin_sim)
# notcoin
notcoin_sim<-date.simulate(notcoin, start.date=1000, end.date=1600, bin.width=20, reps=1000)
lines.chron(notcoin_sim)
poly.chron(notcoin_sim)
box.chron(notcoin_sim)
# Creating data
data = seq(0,1, length=100)
# Applying a probability distribution function to it with different parameters of α and β:
# α = 1 and β = 1 (each bin is given equal importance)
plot(data, dbeta(data, 1, 1), ylab="density", type ="l", col="black")
# α = 2 and β = 2 (prioritizes the middle at the expense of edge bins)
plot(data, dbeta(data, 2, 2), ylab="density", type ="l", col="red")
# Applying to our earlier data with α = 2 and β = 2
mydata_sim_2<-date.simulate(mydata, start.date=1000, end.date=1600, bin.width=20, reps=1000, a=2, b=2)
lines.chron(mydata_sim_2)
poly.chron(mydata_sim_2)
box.chron(mydata_sim_2)
Using Dataset 1, a) plot medieval settlements in the county of Kent, and b) run K, L and Pair Correlation Function analysis with 99 runs of Monte Carlo simulation on the data.
# Setting up WD and necessary libraries
setwd("~/R/SADE/week7")
library(rgdal)
library(raster)
library(spatstat)
library(maptools)
library(GISTools)
# Loading and plotting the polygon "kent"
polyg <- readOGR(dsn="kent")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\Uine\Documents\R\SADE\week7\kent", layer: "kent"
## with 1 features
## It has 1 fields
plot(polyg)
# Adding Digital Elevation Model to the plot to visualize topography
dem <- raster("dem_england/dem_england_historic.tif")
plot(dem, add=T, col=terrain.colors(15))
# Adding the medieval settlements as points to the plot
sett <- readOGR(dsn="dataset1")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\Uine\Documents\R\SADE\week7\dataset1", layer: "dataset1"
## with 294 features
## It has 5 fields
## Integer64 fields read as strings: value_1086
points(sett, pch=19, cex=0.2)
# Omitting points that might fall outside of the polygon
sett <- sett[polyg, ]
# Inspecting how "dataset1" looks like
head(sett)
## OSREFS Min_Eastin Min_Northi vill_name value_1086
## 1 TQ4454 544000 154000 Westerham 40
## 2 TQ4655 546000 155000 Brasted 17
## 3 TQ4854 548000 154000 Sundridge 18
## 4 TQ5259 552000 159000 Otford 60
## 5 TQ5264 552000 164000 Lullingstone 5
## 6 TQ5465 554000 165000 Eynsford 20
summary(sett)
## Object of class SpatialPointsDataFrame
## Coordinates:
## min max
## coords.x1 544000 637000.0
## coords.x2 118000 176440.7
## Is projected: TRUE
## proj4string :
## [+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000
## +y_0=-100000 +ellps=airy +units=m +no_defs]
## Number of points: 294
## Data attributes:
## OSREFS Min_Eastin Min_Northi vill_name
## Length:294 Min. :544000 Min. :118000 Length:294
## Class :character 1st Qu.:574000 1st Qu.:146000 Class :character
## Mode :character Median :599500 Median :153500 Mode :character
## Mean :595755 Mean :152724
## 3rd Qu.:614000 3rd Qu.:160000
## Max. :637000 Max. :177000
## value_1086
## Length:294
## Class :character
## Mode :character
##
##
##
# Converting the data into spastat object for analysis
sp_sett <- as.ppp(coordinates(sett),as.owin(polyg))
# K-Function analysis
# "Pois" is the expected Poisson line, the others are how the data compares to it with a few different edge corrections
k_fun <- Kest(sp_sett)
plot(k_fun, xlim=c(0,5000), main="K-Function analysis")
# L-Function analysis
# Same as K but "Pois" is straightened to a straight line
l_fun <- Lest(sp_sett)
plot(l_fun, xlim=c(0,5000), main="L-Function analysis")
# Pair Correlation Function analysis (PCF)
# Same principle as K, but different process ("donut rings")
# Expected Poisson is the green horizontal line
pc_fun <- pcf(sp_sett)
plot(pc_fun, xlim=c(0,5000), main="Pair Correlation Function analysis")
# Monte Carlo Simulation
pc_fun_100 <- envelope(sp_sett, pcf, nsim=99)
## Generating 99 simulations of CSR ...
## 1, 2, [etd 5:40] 3, [etd 5:14] 4,
## [etd 6:07] 5, [etd 5:54] 6, [etd 5:49] 7, [etd 5:37] 8,
## [etd 5:46] 9, [etd 5:37] 10, [etd 5:38] 11, [etd 5:37] 12,
## [etd 5:27] 13, [etd 5:23] 14, [etd 5:21] 15, [etd 5:15] 16,
## [etd 5:10] 17, [etd 5:04] 18, [etd 5:04] 19, [etd 5:05] 20,
## [etd 5:00] 21, [etd 4:57] 22, [etd 4:53] 23, [etd 4:52] 24,
## [etd 4:50] 25, [etd 4:47] 26, [etd 4:45] 27, [etd 4:41] 28,
## [etd 4:39] 29, [etd 4:37] 30, [etd 4:36] 31, [etd 4:31] 32,
## [etd 4:29] 33, [etd 4:24] 34, [etd 4:20] 35, [etd 4:16] 36,
## [etd 4:11] 37, [etd 4:09] 38, [etd 4:06] 39, [etd 4:02] 40,
## [etd 3:56] 41, [etd 3:51] 42, [etd 3:48] 43, [etd 3:44] 44,
## [etd 3:39] 45, [etd 3:35] 46, [etd 3:31] 47, [etd 3:26] 48,
## [etd 3:22] 49, [etd 3:18] 50, [etd 3:14] 51, [etd 3:10] 52,
## [etd 3:06] 53, [etd 3:01] 54, [etd 2:58] 55, [etd 2:54] 56,
## [etd 2:50] 57, [etd 2:46] 58, [etd 2:41] 59, [etd 2:37] 60,
## [etd 2:33] 61, [etd 2:29] 62, [etd 2:25] 63, [etd 2:21] 64,
## [etd 2:16] 65, [etd 2:12] 66, [etd 2:09] 67, [etd 2:05] 68,
## [etd 2:01] 69, [etd 1:57] 70, [etd 1:53] 71, [etd 1:49] 72,
## [etd 1:45] 73, [etd 1:41] 74, [etd 1:37] 75, [etd 1:33] 76,
## [etd 1:29] 77, [etd 1:25] 78, [etd 1:21] 79, [etd 1:18] 80,
## [etd 1:14] 81, [etd 1:10] 82, [etd 1:06] 83, [etd 1:02] 84,
## [etd 58 sec] 85, [etd 54 sec] 86, [etd 51 sec] 87, [etd 47 sec] 88,
## [etd 43 sec] 89, [etd 39 sec] 90, [etd 35 sec] 91, [etd 31 sec] 92,
## [etd 27 sec] 93, [etd 23 sec] 94, [etd 19 sec] 95, [etd 16 sec] 96,
## [etd 12 sec] 97, [etd 8 sec] 98, [etd 4 sec] 99.
##
## Done.
plot(pc_fun_100, xlim=c(0,20000), main="South: PCF with 99 MC Simulations")
Using Dataset 2, a) plot the distribution of medieval object findspots in the county of Kent. b) Perform kernel density analysis on the findspots, using a few different search radii. c) Also perform a relative risk surface analysis, using the object type “seal matrix”.
Perform PCA on dataset 4.
Perform cluster analysis on dataset 4.